Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Principles of Dynamic Languages

Principles of Dynamic Languages

Slides from the 10-week course I taught in the Computer Science Masters Degree program at the University of Chicago in 2008. Fairly opinionated and off the deep end. If you ever wanted to attend a 30-hour tutorial, this is for you.

David Beazley

January 01, 2008
Tweet

More Decks by David Beazley

Other Decks in Programming

Transcript

  1. Copyright (C) 2008, http://www.dabeaz.com 2- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  2. Copyright (C) 2008, http://www.dabeaz.com 1- A Quote 2 “There are

    two types of programming languages--those that everyone hates and those that nobody uses.” - John Ousterhout (overheard at a conference) This course is mostly about the first category...
  3. Copyright (C) 2008, http://www.dabeaz.com 1- Book Sales (Q1'07) 4 "State

    of the Computer Book Market Q1 07" - Mike Hendrickson Java 63136 C# 52655 Javascript 48266 PHP 41933 C/C++ 41311 Visual Basic 26385 Ruby 25380 SQL 22188 Perl 10308 Python 9909
  4. Copyright (C) 2008, http://www.dabeaz.com 1- Questions 5 • What is

    a "dynamic programming language?" • What are they used for? • Where did they come from? • How do you use them?
  5. Copyright (C) 2008, http://www.dabeaz.com 1- Language Classification 7 • The

    terminology is somewhat imprecise • Static programming languages (a.k.a., "serious" programming languages) C, C++, C#, Java, ADA, Pascal, etc. • Dynamic programming languages (a.k.a., "hacky" programming languages) Perl, Python, Ruby, Tcl, Javascript, PHP,, etc.
  6. Copyright (C) 2008, http://www.dabeaz.com 1- What is the Difference? 8

    • Clearly there is some kind of distinction • Other than dynamic languages often being derided by "real programmers" • Let's look at a simple programming problem...
  7. Copyright (C) 2008, http://www.dabeaz.com 1- Programming Problem • Dave's Mortgage

    Dave has taken out a $500,000 mortgage from Guido's Mortgage, Stock, and Viagra trading corporation. He got an unbelievable rate of 4% and a monthly payment of only $499. However, Guido, being kind of soft-spoken, didn't tell Dave that after 2 years, the rate changes to 9% and the monthly payment becomes $3999. 9 • Question: How much does Dave pay and how many months does it take?
  8. Copyright (C) 2008, http://www.dabeaz.com 1- Question • How do we

    write a program to solve this problem? • As this is a computer science course, let's use a "serious" programming language • For example : C 10
  9. Copyright (C) 2008, http://www.dabeaz.com 1- Solution (ANSI C) #include <stdio.h>

    int main(int argc, char *argv[]) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; while (principle > 0) { principle = principle*(1+rate/12) - payment; total_paid += payment; month += 1; if (month == 24) { payment = 3999; rate = 0.09; } } printf("Total paid %0.2f\n", total_paid); printf("Months %d\n", month); } 11
  10. Copyright (C) 2008, http://www.dabeaz.com 1- Or if you prefer, Java...

    public class Mortgage { public static void main(String[] args) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; while (principle > 0) { principle = principle*(1+rate/12) - payment; total_paid += payment; month += 1; if (month == 24) { payment = 3999; rate = 0.09; } } System.out.println("Total paid " + total_paid); System.out.println("Months " + month); } } 12
  11. Copyright (C) 2008, http://www.dabeaz.com 1- Compilation • In "serious" languages,

    programs are compiled 13 shell % cc mortgage.c -o mortgage.exe shell % • Requires the use of a compiler/development environment (gcc, Visual Studio, etc.) shell % javac Mortgage.java shell % • Produces an executable/class file that is separate from the original source code • That is what you use to run the program
  12. Copyright (C) 2008, http://www.dabeaz.com 1- Sample Output shell % mortgage.exe

    Total paid 2623323.00 Months 677 shell % shell % java Mortgage Total paid 2623323.0 Months 677 shell % 14 • Running the programs
  13. Copyright (C) 2008, http://www.dabeaz.com 1- More on Compilation 15 •

    Compilation is a one-time operation. When you want to run the program, you just use the output of the compiler (e.g., the .exe file) • If you want to make any change to the program, the source must be recompiled. • Edit/compile/run/debug cycle.
  14. Copyright (C) 2008, http://www.dabeaz.com 1- More on Compilation 16 •

    Compilers perform extensive error checking/validation. • Goal is to find errors before the program runs (reported as compiler errors) • To do this, programs include extra specifications that are used to perform these checks. • Usually associated with "type-checking"
  15. Copyright (C) 2008, http://www.dabeaz.com 1- Type Checking 17 • All

    data/variables have a fixed "type" (static) int main(int argc, char *argv[]) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; ... • Inconsistent use results in an error if (month == 24) { payment = "A lot"; rate = 0.09; } mortgage.c:15: error: incompatible types in assignment
  16. Copyright (C) 2008, http://www.dabeaz.com 1- Type Checking 18 • All

    functions/methods have prototypes double square(double x) { return x*x; } • Inconsistent use results in errors double y = square(3,4) // Error. Too many args double y = square("Hello") // Error. Bad arg type char *z = square(4.0) // Error. Bad return type • Emphasize: Errors caught during compilation
  17. Copyright (C) 2008, http://www.dabeaz.com 1- Static Languages 19 • In

    compiled languages, the main focus is the compiler. • Compiler produces executables, performs validation, reports errors, performs various kinds of optimizations, etc. • The result is a "static" program. A program whose functionality is rigidly fixed at the time of compilation. A program that can not be changed without recompiling.
  18. Copyright (C) 2008, http://www.dabeaz.com 1- Static Languages 20 • Since

    "static" programs have been successfully compiled, you are reasonably sure that they are free from certain kinds of errors (especially inconsistent use of data). • (Of course, there may be other bugs) • Since a compiler provides a framework for analyzing programs, a lot of serious computer science has focused on this.
  19. Copyright (C) 2008, http://www.dabeaz.com 1- Dynamic Languages 21 • A

    main feature of "dynamic" languages is that they get rid of separate compilation • You write programs (usually without worrying about low-level details). • You then just "run" the program. • Let's look at an example...
  20. Copyright (C) 2008, http://www.dabeaz.com 1- Solution (Python) # mortgage.py principle

    = 500000 payment = 499 rate = 0.04 month = 0 total_paid = 0 while principle > 0: principle = principle*(1+rate/12) - payment total_paid += payment month += 1 if month == 24: payment = 3999 rate = 0.09 print "Total paid %0.2f" % total_paid print "Months %d" % month 22
  21. Copyright (C) 2008, http://www.dabeaz.com 1- Sample Output shell % python

    mortgage.py Total paid 2623323.00 Months 677 shell % 23 • Running a Python program • The program is executed by an interpreter (python) that reads statements from the input program and runs them one after the other.
  22. Copyright (C) 2008, http://www.dabeaz.com 1- Some Observations 24 • There

    is no separate compilation. You just run Python on the program. The source code is the program. • If you make changes, they show up next time. • You don't have to package code into a main() function or anything similar to that. • A program can be just a sequence of statements.
  23. Copyright (C) 2008, http://www.dabeaz.com 1- More on Interpreters 25 •

    Interpreters delay error checking/validation to run-time. As a result, programs don't generally involve explicit "type" declarations principle = 500000 payment = 499 rate = 0.04 month = 0 total_paid = 0 Notice how none of these variables assignments have a "type" • One consequence: Programs in dynamic languages tend to involve much less typing (at the keyboard)
  24. Copyright (C) 2008, http://www.dabeaz.com 1- Dynamic Type Checking 26 x

    = 42 # x is an integer ... x = "hello" # x is now a string (OK) • In dynamic languages, variables are not restricted to a single type of data • The type of a "variable" is associated with whatever value is currently assigned to the variable---it may change while running! • This is very different than C/C++/Java.
  25. Copyright (C) 2008, http://www.dabeaz.com 1- Dynamic Type Checking 27 z

    = x + y # Succeeds if x + y makes sense x = 37 y = 42 z = x + y # Ok. z = 79 x = "Hello" y = "World" z = x + y # Ok. z = "HelloWorld" x = 37 y = "World" z = x + y # Error! This operation fails because the two operands are incompatible (number and string) • All operations involve run-time checks
  26. Copyright (C) 2008, http://www.dabeaz.com 1- Interactive Console 28 • Since

    dynamic languages do everything at run-time, the interpreters can often be used interactively (like a shell) shell % python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin >>> 3 + 4 7 >>> print "Hello World" Hello World >>> • Sometimes known as a "read-eval" loop
  27. Copyright (C) 2008, http://www.dabeaz.com 1- Dynamic == Run time 29

    • As a general rule, when a computer scientist talks about some part of a program being "dynamic", it means that it occurs while the program runs. • Dynamic Typing - Type checking at run time. • Dynamic Binding - Virtual methods in OO • Dynamic Linking - Run-time linking of program modules/libraries
  28. Copyright (C) 2008, http://www.dabeaz.com 1- Some Complaints 30 • Performance.

    Dynamic programs run much slower than static programs because they perform all of the error-checking as the program runs. • Systems. Hard to do low-level hacking of the hardware (i.e., device drivers) • Validation. Since errors are not detected until a program runs, programs may have hidden/obscure errors (that would have been caught by a compiler).
  29. Copyright (C) 2008, http://www.dabeaz.com 1- Benefits 31 • Dynamic languages

    have some benefits. • Rapid Development. Languages are high-level. Programs assembled from components • Scripting. Complex applications can be controlled by programmable scripts that can be changed without having to recompile • Flexibility. Programs can be easily changed/ reconfigured.
  30. Copyright (C) 2008, http://www.dabeaz.com 1- Benefits 32 • Ease of

    use. Dynamic languages are often better suited for end-users. They do not require users to worry about low-level implementation details (such as types). • Portability. Languages are often so high level, they work easily across different machines. • Prototyping. Often significantly easier to prototype a system in a dynamic language.
  31. Copyright (C) 2008, http://www.dabeaz.com 1- Applications 33 • Dynamic languages

    are used almost everywhere--often behind the scenes. • Internet (Google,Web, etc.) • Movie-making (special effects) • Television (control systems) • Scientific computing (supercomputing) • Robots • Video games
  32. Copyright (C) 2008, http://www.dabeaz.com 1- Commentary 34 • Dynamic languages

    are often viewed as exotic, unreliable, and "unserious" by managers and crusty software engineers • In many cases, their use in an organization is subversive (initiated by lone-programmers, interns, students, etc.) • In certain cases, the use of such languages is considered to be a strategic advantage (a.k.a., a "trade secret").
  33. Copyright (C) 2008, http://www.dabeaz.com 1- The Real Secret 35 •

    Programmers are not using dynamic languages as a replacement for C++ or Java. • They're using these languages in addition to static languages • They're writing programs that utilize the strengths of both (e.g., C++ for speed, dynamic languages for flexibility).
  34. Copyright (C) 2008, http://www.dabeaz.com 1- Part 2 36 The history

    of dynamic languages (Where did these languages come from?)
  35. Copyright (C) 2008, http://www.dabeaz.com 1- Prehistory 37 • In the

    early days, computers were big, very expensive, and quite limited in power (your cell-phone has far more compute power). • Early programs written directly on the hardware (hard-wired, machine language). • Later, assembly language. • Fed to systems on punch-cards
  36. Copyright (C) 2008, http://www.dabeaz.com 1- Programming Languages 38 • The

    first "high-level" programming languages • Fortran (1954) • Lisp (1958) • ALGOL (1958) • COBOL (1959) • Each of these efforts came out of different communities (Fortran - Engineering/Science, Lisp - Mathematics, COBOL - Business)
  37. Copyright (C) 2008, http://www.dabeaz.com 1- Programming Languages 39 • In

    the early days, no-one really knew exactly what they were doing • "Computer Science" didn't even emerge as a separate discipline until the 1960s • Aspects of "programming languages" were still being worked out.
  38. Copyright (C) 2008, http://www.dabeaz.com 1- Fortran 40 • A language

    primarily meant to replace hand- coding of assembly language • Highly focused on raw performance for science/engineering work • Initially developed by IBM around 1954 • Still used today for that same purpose (Fortran 2008 standard underway).
  39. Copyright (C) 2008, http://www.dabeaz.com 1- Example (Fortran) C CALCULATE DAVE'S

    MORTGAGE PROGRAM MORTGAGE PRINCIPLE = 500000.0 PAYMENT = 499.0 RATE = 0.04 NMONTHS = 0 TOTALPAID = 0 10 PRINCIPLE = PRINCIPLE*(1+RATE/12.0)-PAYMENT TOTALPAID = TOTALPAID + PAYMENT NMONTHS = NMONTHS + 1 IF (NMONTHS .EQ. 24) THEN PAYMENT = 3999.0 RATE = 0.09 END IF (PRINCIPLE .GT. 0) GO TO 10 WRITE(*,*) 'TOTAL PAID', TOTALPAID WRITE(*,*) 'MONTHS', NMONTHS STOP END 41
  40. Copyright (C) 2008, http://www.dabeaz.com 1- Fortran 42 • A bizarre

    language in many ways • Example: Implicit Typing NMONTHS = 0 TOTALPAID = 0 An integer A real • Type determined by first character of the name (I-N are ints, all others are reals) • And this only scratches the surface. • Yet, compare the "look" of early Fortran to modern scripting languages
  41. Copyright (C) 2008, http://www.dabeaz.com 1- Side by Side (Visual) C

    CALCULATE DAVE'S MORTGAGE PROGRAM MORTGAGE PRINCIPLE = 500000.0 PAYMENT = 499.0 RATE = 0.04 NMONTHS = 0 TOTALPAID = 0 10 PRINCIPLE = PRINCIPLE*(1+RATE TOTALPAID = TOTALPAID + PAYME NMONTHS = NMONTHS + 1 IF (NMONTHS .EQ. 24) THEN PAYMENT = 3999.0 RATE = 0.09 END IF (PRINCIPLE .GT. 0) GO TO 10 WRITE(*,*) 'TOTAL PAID', TOTALPA WRITE(*,*) 'MONTHS', NMONTHS STOP END 43 # mortgage.py principle = 500000 payment = 499 rate = 0.04 month = 0 total_paid = 0 while principle > 0: principle = principle*(1+rate total_paid += payment month += 1 if month == 24: payment = 3999 rate = 0.09 print "Total paid %0.2f" % tota print "Months %d" % month Fortran Python
  42. Copyright (C) 2008, http://www.dabeaz.com 1- ALGOL 44 • Development by

    a committee of scientists around 1958 (ETH-Zurich) • Initial motivation was to address perceived problems with FORTRAN (of which there were many) • Was hugely influential in subsequent computer science research on programming languages, type-systems, compilers, etc.
  43. Copyright (C) 2008, http://www.dabeaz.com 1- Example (ALGOL-60) begin comment Calculate

    Dave's mortage; real principle,rate,payment,totalpaid; integer month; principle := 500000; rate := 0.04; payment := 499; totalpaid := 0; month := 0; next: principle := principle*(1+rate/12) - payment; totalpaid := totalpaid + payment; month := month + 1; if month = 24 then begin rate := 0.09; payment := 3999 end; if principle > 0 then go to next; outstring(1,"Total paid "); outreal(1,totalpaid); outstring(1,"\nMonths "); outinteger(1,month); outstring(1,"\n") end 45
  44. Copyright (C) 2008, http://www.dabeaz.com 1- Example (ALGOL-60) begin comment Calculate

    Dave's mortage; real principle,rate,payment,totalpaid; integer month; principle := 500000; rate := 0.04; payment := 499; totalpaid := 0; month := 0; next: principle := principle*(1+rate/12) - payment; totalpaid := totalpaid + payment; month := month + 1; if month = 24 then begin rate := 0.09; payment := 3999 end; if principle > 0 then go to next; outstring(1,"Total paid "); outreal(1,totalpaid); outstring(1,"\nMonths "); outinteger(1,month); outstring(1,"\n") end 46 Type declarations
  45. Copyright (C) 2008, http://www.dabeaz.com 1- ALGOL 47 • Virtually all

    modern programming languages utilize concepts that were worked out in various versions of ALGOL. • Very strong focus on the design of compilers • However, ALGOL itself never really caught on commercially (legacy ALGOL?) • The language never offered any standard I/O facilities (different on every machine)
  46. Copyright (C) 2008, http://www.dabeaz.com 1- COBOL 48 • A language

    used widely in business/finance • Developed in 1959 by committee (Burroughs, IBM, Honeywell, RCA, Sperry Rand, Sylvania, USAF, NIST, etc.) • Still lives today and has all of the features that you would expect from such a committee effort.
  47. Copyright (C) 2008, http://www.dabeaz.com 1- Example (COBOL) IDENTIFICATION DIVISION. PROGRAM-ID.

    MORTGAGE. DATA DIVISION. WORKING-STORAGE SECTION. 01 PRINCIPLE PIC S9(7)V99 VALUE 500000.00 . 01 PAYMENT PIC 9(7)V99 VALUE 499.00 . 01 RATE PIC 9V99 VALUE 0.04 . 01 MONTH PIC 999 VALUE 0 . 01 TOTALPAID PIC 9(7) VALUE 0.00 . PROCEDURE DIVISION . MAIN. PERFORM WITH TEST BEFORE UNTIL PRINCIPLE < 0.00 COMPUTE PRINCIPLE = PRINCIPLE*(1+RATE/12)-PAYMENT ADD PAYMENT TO TOTALPAID ADD 1 TO MONTH IF MONTH = 24 THEN SET PAYMENT TO 3999.00 SET RATE TO 0.09 END-IF END-PERFORM DISPLAY "TOTAL PAID", TOTALPAID DISPLAY "MONTHS", MONTH. STOP RUN. 49
  48. Copyright (C) 2008, http://www.dabeaz.com 1- COBOL 50 "The use of

    COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense." - Edsger Dijkstra • So, let's move on...
  49. Copyright (C) 2008, http://www.dabeaz.com 1- Lisp 51 • Conceived as

    a language for writing computer programs based on the lambda calculus (Alonzo Church) • Invented by John McCarthy at MIT (1958) • Name derives from "List Processisng Language" • Many modern variations in use (Common Lisp, Scheme, etc.)
  50. Copyright (C) 2008, http://www.dabeaz.com 1- Example (Scheme) (define (daves-mortgage principle

    payment rate total month) (if (> principle 0.0) (mortgage (- (* principle (+ 1 (/ rate 12))) payment) (if (= month 24) 3999.0 payment) (if (= month 24) 0.09 rate) (+ total payment) (+ 1 month)) (cons total (- month 1) ) ) ) (display (daves-mortgage 500000.0 499.0 0.04 0.0 1)) (newline) 52 (and this solution is pretty clunky---Lisp programmers would be more clever about it)
  51. Copyright (C) 2008, http://www.dabeaz.com 1- Lisp 53 • Lisp is

    truly unlike any of the other early programming languages • The entire language is basically based on the "list." Lisp programs themselves are lists (thus programs can process their own code as data). • Programs written as functions that apply various operations to lists (functional programming). Especially strong reliance on recursive functions, mathematical thinking.
  52. Copyright (C) 2008, http://www.dabeaz.com 1- Lisp 54 • Lisp is

    the first major dynamic language • Almost all major concepts of dynamic programming languages were first invented with Lisp. • Modern languages like Python and Ruby borrow heavily from Lisp, but even today, have not replicated all of its features. • All "real programmers" eventually reinvent some part of Lisp without knowing it.
  53. Copyright (C) 2008, http://www.dabeaz.com 1- Lisp Criticisms 55 • In

    early computing, machines were quite limited and Lisp was much slower than the compiled languages (dynamic) "A LISP programmer knows the value of everything, but the cost of nothing" - Alan Perlis • Lisp was not widely adopted by those who were obsessed with high-performance (science/engineering/business).
  54. Copyright (C) 2008, http://www.dabeaz.com 1- Lisp Criticisms 56 • Understanding

    Lisp requires a certain degree of mathematical sophistication. Functions, composition of functions, recursion, etc. • Let's be honest---a huge majority of the world's programmers are not mathematicians. You don't need a math degree to write accounting software (or make a web page). • Programs in other languages are more like "recipes" of steps (imperative). Conceptually, this is easier for most people to grasp.
  55. Copyright (C) 2008, http://www.dabeaz.com 1- Interlude 57 • The programming

    languages of choice for "serious" applications have almost all been compiled programming languages that derive from Fortran/Algol. • Some major languages from the 70s/80s • Fortran (updates, F66, F77). • Pascal (1970) • C (1972)
  56. Copyright (C) 2008, http://www.dabeaz.com 1- Pascal 58 • Developed in

    1970 by Niklaus Wirth • Derives from ALGOL, but strongly focused on structured programming/data structures • Initially developed as a teaching language for structured programming. • Personal note : I remember learning Pascal in high school (1985). Most programmers in 70's/80's would have encountered it.
  57. Copyright (C) 2008, http://www.dabeaz.com 1- Example (Pascal) program Mortgage(output); var

    principle : Real = 500000; rate : Real = 0.04; payment : Real = 499; month : Integer = 0; totalpaid : Real = 0; begin while principle > 0 do begin principle := principle*(1+rate/12) - payment; totalpaid := totalpaid + payment; month := month + 1; if month = 24 then begin rate := 0.09; payment := 3999 end end; writeln('Total Paid ', totalpaid); writeln('Months ', month) end. 59
  58. Copyright (C) 2008, http://www.dabeaz.com 1- Using Pascal 60 • Pascal

    was very picky about "correctness" • Very strong type system, very strict in what it allowed and did not allow (pitched as a good teaching language) • Used as an alternative to BASIC on early PCs (e.g., Turbo Pascal, UCSD Pascal, etc.). • Early Macintosh systems made heavy use of Pascal (parts of the OS, major applications)
  59. Copyright (C) 2008, http://www.dabeaz.com 1- C 61 • Developed in

    1972 at AT&T Bell Labs by Dennis Ritchie in order to implement Unix • Created as a systems implementation language (a better assembly language) • Although the language borrows some ideas from ALGOL, the language was always meant to be minimal and low-level.
  60. Copyright (C) 2008, http://www.dabeaz.com 1- Some Background 62 • Despite

    developments in programming languages, there are situations where programs need to directly manipulate the computer hardware • Operating systems, device drivers, etc. • Before C, this code would typically be written in assembly language
  61. Copyright (C) 2008, http://www.dabeaz.com 1- Example (Assembly) .principle: .double 5000000

    .payment: .double 499 .rate: .double 0.04 main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebp movl %esp, %ebp push %ecx, subl $68, %esp fldl .principle fstpl -48(%ebp) fldl .payment fstp1 -40(%ebp) fldl .rate fstpl -32(%ebp) movl $0, -12(%ebp) fldz ... etc ... 63
  62. Copyright (C) 2008, http://www.dabeaz.com 1- The C Language 64 •

    C was not created to be a better ALGOL. It was a replacement for assembly coding. • Although it was compiled, "safety" was never really a primary concern • C allowed direct access to hardware/ memory (the complete opposite of Pascal) • Witness the consequences : Buffer overflow attacks (malware).
  63. Copyright (C) 2008, http://www.dabeaz.com 1- C Adoption 65 • C/C++

    is currently the de-facto standard for developing systems software • There are many reasons why this happened • Not just related to the technical merits (or lack of) of C as a programming language
  64. Copyright (C) 2008, http://www.dabeaz.com 1- More Background 66 • Development

    of minicomputers/ microcomputers in the 1970s • These systems were extremely minimal/ resource starved (compare a Commodore-64 with an IBM mainframe). • If you wanted anything to run fast and fit in memory, you had to write it in assembly. • A lot of early PC software was assembly
  65. Copyright (C) 2008, http://www.dabeaz.com 1- Growth of C 67 •

    Use of C really exploded with minis/PCs • C was minimal, portable, and didn't enforce any morality rules on programming (you could do anything you wanted). • You could easily write programs that ran almost as fast as hand-written assembler (maybe even faster with optimization). • Growth completely driven by practical applications (and economics)
  66. Copyright (C) 2008, http://www.dabeaz.com 1- C vs. Pascal 68 •

    I don't know when C clobbered Pascal, but I'm guessing in the late 80s (I don't remember many people talking about Pascal after about 1990). • Humor : "How to shoot yourself in the foot" C : "You shoot yourself in the foot." Pascal : "The compiler won't let you shoot yourself in the foot."
  67. Copyright (C) 2008, http://www.dabeaz.com 1- Problems with C 69 •

    Despite C becoming dominant, it was extremely crippled in certain ways • Not just C, but almost all traditional programming languages
  68. Copyright (C) 2008, http://www.dabeaz.com 1- Interactive Programs 70 • Computers

    had become a lot more interactive • Early 1970's : Interactive video terminals replaced punch cards. Enabled programs to interact with the user in new ways. (shells) • Early 1980's : Graphical User Interfaces. A lot of previous research (e.g., Xerox), but the Apple Lisa/Mac was really first GUI-centric system.
  69. Copyright (C) 2008, http://www.dabeaz.com 1- Compute Power 71 • Increased

    computing power • Rapid growth of CPU power, memory capacity, and disk storage • Enabled new kinds of programs with vastly more complexity than anything before. • GUIs took it to a whole new level
  70. Copyright (C) 2008, http://www.dabeaz.com 1- Software Complexity 72 • Structured

    programming was not enough from the standpoint of software engineering. • How to manage large-scale programming projects and complexity? • Rise of object-oriented programming, software components, etc. (1980s). • Example : Development of C++ as a "better C"
  71. Copyright (C) 2008, http://www.dabeaz.com 1- Software Components 73 • Much

    greater reliance on programming libraries, pre-built software components • Example: GUI "widgets" • Writing software was becoming less about creating everything from scratch and more about gluing together components that already existed.
  72. Copyright (C) 2008, http://www.dabeaz.com 1- A Quote 74 "It seems

    clear that languages somewhat different from those in existence today would enhance the preparation of structured programs. We will perhaps eventually be writing only small modules which are identified by name as they are used to build larger ones, so that devices like indentation [...] might become feasible for expressing local structure in the source language." - Donald Knuth (1974)
  73. Copyright (C) 2008, http://www.dabeaz.com 1- BASIC 75 • Virtually all

    early home computers came prepackaged with BASIC (usually Microsoft) • If you turned on the system, you were often dropped directly into a BASIC interpreter • Greatly expanded the number of programmers • Notion of "programmability" (maybe BASIC's only redeeming quality other than peek/poke)
  74. Copyright (C) 2008, http://www.dabeaz.com 1- Internet 76 • Internet was

    growing (1980's) • Major universities/corporations were already connected (ARPA Net) • Services for home users (Compuserve, BBS,etc). • Growth of free software/open source • Sharing of ideas and source code
  75. Copyright (C) 2008, http://www.dabeaz.com 1- Setting the Stage 77 •

    By the mid 1980's, there were a lot of things going on (PCs, GUIs, faster systems, early Internet, objects, etc.). • Programmers using C/Pascal, but there were many perceived limitations. • A cauldron of activity.
  76. Copyright (C) 2008, http://www.dabeaz.com 1- Part 3 78 From C

    to Scripting (Or how application programmers reinvented Lisp)
  77. Copyright (C) 2008, http://www.dabeaz.com 1- Programming • Programmers usually write

    programs to solve problems. • Problems that are of interest to people who are usually not programmers • Therefore, it is important to figure out some way to make programs generally usable (i.e., "user friendly"). 79
  78. Copyright (C) 2008, http://www.dabeaz.com 1- A Simple C Program #include

    <stdio.h> int main(int argc, char *argv[]) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; while (principle > 0) { principle = principle*(1+rate/12) - payment; total_paid += payment; month += 1; if (month == 24) { payment = 3999; rate = 0.09; } } printf("Total paid %0.2f\n", total_paid); printf("Months %d\n", month); } 80
  79. Copyright (C) 2008, http://www.dabeaz.com 1- A Simple C Program #include

    <stdio.h> int main(int argc, char *argv[]) { double principle = 500000; double payment = 499; double rate = 0.04; int month = 0; double total_paid = 0; while (principle > 0) { principle = principle*(1+rate/12) - payment; total_paid += payment; month += 1; if (month == 24) { payment = 3999; rate = 0.09; } } printf("Total paid %0.2f\n", total_paid); printf("Months %d\n", month); } 81 Variables are hard-coded with values. Program logic is hard-coded too.
  80. Copyright (C) 2008, http://www.dabeaz.com 1- Problem • The previous program

    just isn't that useful • Everything (including the underlying logic) is hard-coded into the program • Despite the possible merits of keeping a programmer employed full-time to make changes, the program isn't reusable • Thus, an important part of software engineering is how to make code more general purpose 82
  81. Copyright (C) 2008, http://www.dabeaz.com 1- Reusability • As a general

    rule, software engineers do not like to write programs where everything is just hard-coded. • I'd deduct points if you turned in a big programming assignment and you did this. • An extreme example : A former student (nameless) when asked to create a website that played "tic tac toe" tried to create a separate .html document for every possible game configuration (of which there are many) 83
  82. Copyright (C) 2008, http://www.dabeaz.com 1- User-Defined Parameters #define PRINCIPLE 500000

    #define PAYMENT 3999 #define RATE 0.09 #define TEASER_PAYMENT 499 #define TEASER_RATE 0.04 #define TEASER_PERIOD 24 int main(int argc, char *argv[]) { double payment = TEASER_PAYMENT; double rate = TEASER_RATE; ... if (month == TEASER_PERIOD) { payment = PAYMENT; rate = RATE; } ... } 84
  83. Copyright (C) 2008, http://www.dabeaz.com 1- #define PRINCIPLE 500000 #define PAYMENT

    3999 #define RATE 0.09 #define TEASER_PAYMENT 499 #define TEASER_RATE 0.04 #define TEASER_PERIOD 24 int main(int argc, char *argv[]) { double payment = TEASER_PAYMENT; double rate = TEASER_RATE; ... if (month == TEASER_PERIOD) { payment = PAYMENT; rate = RATE; } ... } 85 Problem parameters are specified in one location using symbolic names. Use the symbolic names in the later code User-Defined Parameters
  84. Copyright (C) 2008, http://www.dabeaz.com 1- Commentary • Using defined constants

    makes it easier to change the code. If you want to make changes to parameters, you just change in one location. • However, it's still not very user-friendly. To change a parameter, you have to recompile • "Pardon me, I'll tell you how much your mortgage will cost so soon as I finish recompiling my mortgage software." 86
  85. Copyright (C) 2008, http://www.dabeaz.com 1- Reading User Input int main(int

    argc, char *argv[]) { double principle, payment, rate; double teaser_payment, teaser_rate; int teaser_period; scanf("%lf",&principle); scanf("%lf",&payment); scanf("%lf",&rate); scanf("%lf",&teaser_payment); scanf("%lf",&teaser_rate); scanf("%d",&teaser_period); ... } 87
  86. Copyright (C) 2008, http://www.dabeaz.com 1- Reading User Input int main(int

    argc, char *argv[]) { double principle, payment, rate; double teaser_payment, teaser_rate; int teaser_period; scanf("%lf",&principle); scanf("%lf",&payment); scanf("%lf",&rate); scanf("%lf",&teaser_payment); scanf("%lf",&teaser_rate); scanf("%d",&teaser_period); ... } 88 Parameters are read from the user when program runs. shell % mortgage.exe 500000.0 3999.0 0.09 499.0 0.04 24 Total paid 2623323.00 Months 677
  87. Copyright (C) 2008, http://www.dabeaz.com 1- Reading User Input • Reading

    parameters from user works, but does not scale well to more complicated problems. • Example : A problem where you had to specify several hundred parameters • Also messy if the program is "branchy." • End-users probably find this to be clunky 89
  88. Copyright (C) 2008, http://www.dabeaz.com 1- A "Branchy" Interface shell %

    loan.exe Loan type (1=Mortgage, 2=Auto, 3=Commercial) : 1 Mortgage type (1=Conventional, 2=Evil) : 2 Principle : 500000 Payment : 3999 Rate : 0.09 Teaser Payment : 499 Teaser Rate : 0.04 Teaser Period : 24 Be a sneaky bugger? (Y=Yes, N=No) : Y ... 90 • You might laugh, but a huge amount of "mission-critical" software is often not much more sophisticated than this. • Many GUIs not much different (dialogs)
  89. Copyright (C) 2008, http://www.dabeaz.com 1- Configuration Languages • As applications

    grow, the process of reading input may evolve into a simple "configuration" language. 91 # Dave's mortgage LOAN_TYPE = MORTGAGE MORTGAGE_TYPE = EVIL BE_SNEAKY = YES PRINCIPLE = 500000 PAYMENT = 3999 RATE = 0.09 TEASER_RATE = 0.04 TEASER_PAYMENT = 499 • Notion of "programmability"
  90. Copyright (C) 2008, http://www.dabeaz.com 1- Configuration Languages • Configuration languages

    have a tendency to grow new features. • Example : Variable expansion and expressions 92 # Dave's mortgage LOAN_TYPE = MORTGAGE MORTGAGE_TYPE = EVIL BE_SNEAKY = YES PRINCIPLE = 500000 PAYMENT = 3999 RATE = 0.09 TEASER_RATE = $RATE/2 TEASER_PAYMENT = $PAYMENT/10
  91. Copyright (C) 2008, http://www.dabeaz.com 1- Configuration Languages • Example :

    Conditional Evaluation 93 # Dave's mortgage LOAN_TYPE = MORTGAGE MORTGAGE_TYPE = EVIL BE_SNEAKY = YES PRINCIPLE = 500000 PAYMENT = 3999 RATE = 0.09 if $PRINCIPLE > 350000 TEASER_RATE = $RATE/2 TEASER_PAYMENT = $PAYMENT/10 else TEASER_RATE = $RATE/1.5 TEASER_PAYMENT = $PAYMENT/5 endif
  92. Copyright (C) 2008, http://www.dabeaz.com 1- Scripting • So, left to

    their own devices, programmers have had a tendency to create their own weird application-specific command/config languages. • You see this in large apps (e.g., VBA in Microsoft Office) • Typically done without thinking much about programming languages, theory, or previous work however. 94
  93. Copyright (C) 2008, http://www.dabeaz.com 1- Configuration Languages • If left

    unchecked, configuration languages may grow into some sort of ad-hoc domain-specific (scripting) language 95 “Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp.” - Philip Greenspun
  94. Copyright (C) 2008, http://www.dabeaz.com 1- Part 4 96 From Scripting

    to Dynamic Languages (Or how script programmers started fooling around with programming languages)
  95. Copyright (C) 2008, http://www.dabeaz.com 1- Rewind • The concept of

    a "scripting language" has been around for a very long time • JCL - IBM System/360 (1964) • sh - Unix Shell (1971) • Rexx - IBM (1979) • Basically, these languages are oriented around controlling applications and the operating system. 97
  96. Copyright (C) 2008, http://www.dabeaz.com 1- Command Shells • All application

    programmerss know something about the operating system shell • They use it to run their applications, run the compiler, etc. • The command shell is often cited as an influence for the domain-specific languages that get created 98
  97. Copyright (C) 2008, http://www.dabeaz.com 1- Application Control • Scripting/Shell languages

    focus on running other applications • Most basic operation is running a program • Supplying arguments to a program 99 shell % someprog.exe foo bar 42 blah -x int main(int argc, char *argv[]) { ... } # args
  98. Copyright (C) 2008, http://www.dabeaz.com 1- I/O Routing • Shells provide

    mechanisms for I/O 100 shell % someprog.exe > out.txt shell % someprog.exe < in.txt shell % someprog.exe | otherprog.exe • So, you can hook programs up to files and hook programs up to other programs
  99. Copyright (C) 2008, http://www.dabeaz.com 1- Environment Variables • Shells have

    global environment variables 101 shell % printenv PWD=/Users/beazley HOME=/Users/beazley LOGNAME=beazley HOSTTYPE=intel-pc VENDOR=apple OSTYPE=darwin MACHTYPE=i386 GROUP=staff shell % echo $LOGNAME beazley shell % • These variables are passed to applications • Values are just simple strings
  100. Copyright (C) 2008, http://www.dabeaz.com 1- Local Variables • Shells also

    have "local" variables 102 shell % x="Hello" shell % y="World" shell % echo "$x $y" Hello World
 shell % • These are not passed to applications, but you can export them to the global environment shell % export x
  101. Copyright (C) 2008, http://www.dabeaz.com 1- Variable Interpolation • Shells tend

    to work heavily with text • Shell interpreter performs variable substitutions prior to executing any command (known as interpolation). • Usually a special syntax is used ($var) 103 shell % cmd=ls shell % opts=-l shell % $cmd $opts /somedir -rw-r--r-- beazley staff 408 Apr 30 2007 foo -rw-r--r-- beazley staff 658 Apr 30 2007 bar -rw-r--r-- beazley staff 332 Apr 30 2007 spam ...
  102. Copyright (C) 2008, http://www.dabeaz.com 1- Control-Flow • Shells have some

    basic control-flow features 104 if test $x -gt 0; then echo "$x is greater than 0" else echo "$x is not greater than 0" fi • Loops x="foo bar spam" for i in $x do echo $i done
  103. Copyright (C) 2008, http://www.dabeaz.com 1- Procedures • You can also

    define procedures in the shell 105 add() { echo `expr $1 + $2` } shell % add 3 4 7 shell % • However, all of this starts to get weird pretty fast (e.g., no local variables)
  104. Copyright (C) 2008, http://www.dabeaz.com 1- Problems with Scripting • Small

    shell scripts tend to grow into large shell scripts (usually unintelligible) • If you want to write an "application", get convoluted mix of tools hooked together in bizarre ways • Very limited support for data processing and data structures (strings and lists of strings) • Slow 106
  105. Copyright (C) 2008, http://www.dabeaz.com 1- A Shell Program #!/bin/sh #

    Dave's mortgage principle=500000 payment=499 rate=0.04 month=0 total_paid=0 while test `echo "$principle > 0" | bc -l` = 1; do principle=`echo "$principle*(1+$rate/12)-$payment" | bc -l` total_paid=`echo "$total_paid+$payment" | bc -l` month=`expr $month + 1` if test $month -eq 24; then payment=3999 rate=0.09 fi done echo "Total paid $total_paid" echo "Months $month" 107
  106. Copyright (C) 2008, http://www.dabeaz.com 1- Beyond Scripting • Programmers like

    the interactivity of shells • But they want to more than just launch other programs and manipulate strings • So, there has always been an interest in expanding shells with features from "real" programming languages • More flexible data structures, proper procedures, control flow, variables, etc. 108
  107. Copyright (C) 2008, http://www.dabeaz.com 1- New Languages 110 • Perl

    (1987 - Larry Wall) • Tcl (1988 - John Ousterhout) • Python (1990 - Guido van Rossum) • Ruby (1993 - Yukihiro “Matz” Matsumoto) • PHP (1994 - Rasmus Lerdorf ) • Javascript (1995 - Brendan Eich/Netscape) • And many others...
  108. Copyright (C) 2008, http://www.dabeaz.com 1- Commentary 111 • Over a

    brief 5-10 year period, there was a sudden flurry of development in which a variety of new programming languages were created • All of these languages were created by single individuals, often without any “official” funding. • Not associated with academic CS. • Question : Why?
  109. Copyright (C) 2008, http://www.dabeaz.com 1- In their own words 112

    "The Tcl scripting language grew out of my work on design tools for integrated circuits [...] Each tool needed to have a command language. However, our primary interest was in the tools, not their command languages. Thus, we didn’t invest much effort in command languages and the languages ended up being weak and quirky. Furthermore, the language for one tool couldn’t be carried over to the next so each tool ended up with a different bad command language. After a while, this became rather embarrassing." - John Ousterhout
  110. Copyright (C) 2008, http://www.dabeaz.com 1- In their own words 113

    "Like the typical human, Perl was conceived in secret, and existed for roughly nine months before anyone in the world ever saw it. Its womb was a secret project for the National Security Agency known as the ‘Blacker’ project, which has long since closed down. The goal of that sexy project was not to produce Perl. However, Perl may well have been the most useful thing to come from Blacker. Sex can fool you that way." - Larry Wall (Note: Perl was created to process logs and generate reports)
  111. Copyright (C) 2008, http://www.dabeaz.com 1- In their own words 114

    "My original motivation for creating Python was the perceived need for a higher level language in the Amoeba [Operating Systems] project. I realized that the development of system administration utilities in C was taking too long. Moreover, doing these things in the Bourne shell wouldn't work for a variety of reasons. ... So, there was a need for a language that would bridge the gap between C and the shell." - Guido van Rossum
  112. Copyright (C) 2008, http://www.dabeaz.com 1- In the words of others

    115 "PHP, known originally as Personal Home Pages, was first conceived in the autumn of 1994 by Rasmus Lerdorf. He wrote it as a way to track visitors to his online CV. The first version was released in early 1995, by which time Rasmus had found that by making the project open-source, people would fix his bugs.” - From “A History of PHP” (In Q1, '07, there were 78 PHP books on the market).
  113. Copyright (C) 2008, http://www.dabeaz.com 1- Commentary • These programming languages

    have largely been developed outside of “academia.” • All are based in practical applications • None was meant to be a “theoretical” experiment in programming languages. • Much to the dismay of academic researchers in programming languages 116 (“I don’t get no respect” - Rodney Rangerfield)
  114. Copyright (C) 2008, http://www.dabeaz.com 1- Food for Thought... 117 “Over

    the Christmas holiday in December 1989, hacker Guido van Rossum of the Netherlands was bored, so he created a descendent of the ABC scripting language for the Unix platform, dubbing it Python, from the British comedy troop Monty Python's Flying Circus." - Timothy Morgan "Python has been an important part of Google since the beginning, and remains so as the system grows and evolves." - Peter Norvig, Google.
  115. Copyright (C) 2008, http://www.dabeaz.com 1- Tcl • Tool Command Language

    ("Tickle") • Released as open-source in late 80s • One of the most influential early scripting languages • The big idea : A simple standardized programming language that could be easily added to other applications (like a library). • So you didn't have to write your own 118
  116. Copyright (C) 2008, http://www.dabeaz.com 1- Traditional C Programs • A

    traditional C program is launched from some kind of shell/command prompt • Command line arguments are passed as strings in an array (argv) 119 shell % someprog.exe foo bar 42 blah -x int main(int argc, char *argv[]) { ... } # args • There is just one entry point to the program (main) which figures out what to do
  117. Copyright (C) 2008, http://www.dabeaz.com 1- Tcl Big Picture • A

    simple interpreter that can call a collection of C functions using the same idea 120 Tcl Interpreter C Code User • User interacts with interpreter, issuing commands that call into C.
  118. Copyright (C) 2008, http://www.dabeaz.com 1- Sample Tcl Command • For

    each "command" you write C code 121 int square(void *clientData, Tcl_Interp *interp, int argc, char *argv[]) { double x; if (argc != 2) { return TCL_ERROR; } x = atof(argv[1]); /* Convert argument */ y = x*x; /* Compute something */ Tcl_SetDouble(interp,y);/* Return result */ return TCL_OK; } • Each command looks like a little C main() function
  119. Copyright (C) 2008, http://www.dabeaz.com 1- Tclsh Shell • Tcl then

    provided a "shell" where commands could be invoked (tclsh) 122 % square 4 16 % square 5 25 % Command Name Arguments int square(void *clientData, Tcl_Interp *interp, int argc, char *argv[]) { ... } Launches
  120. Copyright (C) 2008, http://www.dabeaz.com 1- Using Tcl • Tcl basically

    lifted some ideas from the Unix shell and put them inside C programs • If you had a big C program with several hundred functions, you would take each function and turn it into a Tcl command • High-level control flow of the application driven by a Tcl script (instead of being hard- coded in C). 123
  121. Copyright (C) 2008, http://www.dabeaz.com 1- The Tcl Language • Tcl

    provides a whole language from which you can launch your commands • Variables • Conditionals • Loops • Procedures • Expressions 124
  122. Copyright (C) 2008, http://www.dabeaz.com 1- Sample Tcl Program 125 #

    Dave's mortgage set principle 500000 set payment 499 set rate 0.04 set month 0 set total_paid 0 while {$principle > 0} { set principle [expr {$principle*(1+$rate/12)-$payment}] incr total_paid $payment incr month if {$month == 24} { set payment 3999 set rate 0.09 } } puts "Total paid $total_paid" puts "Months $month"
  123. Copyright (C) 2008, http://www.dabeaz.com 1- Sample Tcl Program 126 •

    The language itself is kind of clunky • Everything is a string. • An expansion of shell programming • But, it is a full-featured programming language
  124. Copyright (C) 2008, http://www.dabeaz.com 1- Tcl/Tk Release • Shortly after

    Tcl, an optional add-on called "Tk" was released • Tk was a Tcl-based interface to a graphical user interface widget set and toolkit • At the time, it revolutionized GUI programming on UNIX systems. • Could build entire GUI using high level scripts 127
  125. Copyright (C) 2008, http://www.dabeaz.com 1- Tk Example 128 • A

    simple button proc pressed {} { puts "You pressed it!" } button .b -text "Press the button" -command pressed pack .b • This replaced several hundred lines of rather gnarly looking C code
  126. Copyright (C) 2008, http://www.dabeaz.com 1- Usage of Tcl/Tk • As

    originally envisioned, Tcl was meant to be a small language you added to huge C programs (most code written in C) • Didn't quite turn out that way. • Programmers wrote huge applications entirely in Tcl (> 100K lines) • Tcl/Tk used in a large number of mission critical applications (control systems, etc.) 129
  127. Copyright (C) 2008, http://www.dabeaz.com 1- The Tcl/Tk Experience • Today,

    Tcl/Tk is out of fashion, but it was very influential. • Showed that there was great utility in using a dynamic language to control code written in a static language (mixed languages) • Later became one of the first cross- platform GUI development languages • A lot of ground-breaking software engineering related to scripting 130
  128. Copyright (C) 2008, http://www.dabeaz.com 1- Tcl/Tk today • Most modern

    dynamic languages have an optional interface to Tcl/Tk for GUI programming. • Python (Tkinter), Perl/Tk, Ruby/Tk, Scheme/ Tk, etc (Tcl is often hidden behind scenes) • Many other languages copied much of the SW-engineering practices of Tcl. 131
  129. Copyright (C) 2008, http://www.dabeaz.com 1- Lessons Learned • Application programmers

    learned that a dynamic language made them far more productive • For example, creating a simple GUI in Tcl was something you could do in an afternoon • Mixed-language development. C for systems/ high performance, Tcl for control. 132
  130. Copyright (C) 2008, http://www.dabeaz.com 1- Perl • Released as open-source

    in 1987 • The big idea : Take concepts from various facets of shell programming, but create a general purpose programming language • Fix annoying issues with shell programs • Incorporate features from text processing tools (especially sed and awk) 133
  131. Copyright (C) 2008, http://www.dabeaz.com 1- Perl Influences • Syntax :

    Roughly taken from C • Expands shell scripting with some new data structures (lists and associative arrays) • Adds support for regular-expression pattern matching (from sed) • Major goal : Scripting related to data processing, text processing, report generation. 134
  132. Copyright (C) 2008, http://www.dabeaz.com 1- Sample Perl 135 # Dave's

    mortgage $principle = 500000; $payment = 499; $rate = 0.04; $month = 0; $total_paid = 0; while ($principle > 0) { $principle = $principle*(1+$rate/12)-$payment; $total_paid += $payment; $month++; if ($month == 24) { $payment = 3999; $rate = 0.09; } } print "Total paid $total_paid\n"; print "Months $month\n";
  133. Copyright (C) 2008, http://www.dabeaz.com 1- The Perl Experience • Perl

    greatly simplified tasks that were previously done using rather complicated shell scripts (and faster) • Showed that a lot of network, system admin, and web-development code could be written entirely in a "script language" • Completely dominated early web- development (CGI scripting) 136
  134. Copyright (C) 2008, http://www.dabeaz.com 1- The Perl Experience • Perl

    user community was very effective at organizing third-party modules, add-ons, and extensions • Huge contributed library (CPAN) • Very influential on other open-source language projects (Ruby, Python, etc.) 137
  135. Copyright (C) 2008, http://www.dabeaz.com 1- The Other Languages • Other

    languages have evolved from earlier experiences with Tcl, Perl, C etc. • Because everything has been done in the open, there is a lot of cross-pollination • For example, Python has copied Perl's regular expression features. Perl copied Python's object system (in a manner) 138
  136. Copyright (C) 2008, http://www.dabeaz.com 1- Big Picture • Extension languages.

    When building a large C application, it is useful to have an extension/ control language (e.g., Tcl). • Scripting languages. For scripting, it is useful to have a real programming language with useful data structures and high-level features (e.g., Perl). 139
  137. Copyright (C) 2008, http://www.dabeaz.com 1- Python Introduction • In this

    class, we will be using a wide variety of programming languages • This section serves as an introduction to one of the languages we'll be using more often. • More reference material is available online and in books 141
  138. Copyright (C) 2008, http://www.dabeaz.com 1- What is Python? • An

    interpreted, dynamically typed programming language. • In other words: A language that's similar to Perl, Ruby, Tcl, and other so-called "scripting languages." • Created by Guido van Rossum around 1990. • Named in honor of Monty Python 142
  139. Copyright (C) 2008, http://www.dabeaz.com 1- Getting Started • In this

    section, we will cover the absolute basics of Python programming • How to start Python • Interactive mode • Creating and running simple programs • Basic calculations and file I/O. 143
  140. Copyright (C) 2008, http://www.dabeaz.com 1- Python Interpreter • Python is

    an interpreter • If you give it a filename, it interprets the statements in that file • Otherwise, you get an "interactive" mode where you can experiment • No edit/compile/run/debug cycle 144
  141. Copyright (C) 2008, http://www.dabeaz.com 1- Running Python (Unix) • Command

    line shell % python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" >>> • Integrated Development Environment (IDLE) shell % idle or 145
  142. Copyright (C) 2008, http://www.dabeaz.com 1- Interactive Mode • Read-eval loop

    >>> print "hello world" hello world >>> 37*42 1554 >>> for i in range(5): ... print i ... 0 1 2 3 4 >>> • Executes simple statements typed in directly • Useful for debugging, exploration 147
  143. Copyright (C) 2008, http://www.dabeaz.com 1- Interactive Mode (IDLE) • Interactive

    shell plus help features syntax highlights usage information 148
  144. Copyright (C) 2008, http://www.dabeaz.com 1- Getting Help • Online help

    is often available • help() command (interactive mode) • Documentation at http://www.python.org 149
  145. Copyright (C) 2008, http://www.dabeaz.com 1- Creating Programs • Programs are

    put in .py files # helloworld.py print "hello world" • Source files are simple text files • Create with your favorite editor (e.g., emacs) • Note: May be special editing modes • Can also edit programs with IDLE or other Python IDE (too many to list) 150
  146. Copyright (C) 2008, http://www.dabeaz.com 1- Running Programs (IDLE) • Select

    "Run Module" (F5) • Will see output in IDLE shell window 154
  147. Copyright (C) 2008, http://www.dabeaz.com 1- Running Programs • In production

    environments, Python may be run from command line or a script • Command line (Unix) shell % python helloworld.py hello world shell % • Command shell (Windows) C:\Somewhere>c:\python25\python helloworld.py hello world C:\Somewhere> 155
  148. Copyright (C) 2008, http://www.dabeaz.com 1- A Sample Program • The

    Sears Tower Problem You are given a standard sheet of paper which you fold in half. You then fold that in half and keep folding. How many folds do you have to make for the thickness of the folded paper to be taller than the Sears Tower? A sheet of paper is 0.1mm thick and the Sears Tower is 442 meters tall. 156
  149. Copyright (C) 2008, http://www.dabeaz.com 1- A Sample Program # sears.py

    # How many times do you have to fold a piece of paper # for it to be taller than the Sears Tower? height = 442 # Meters thickness = 0.1*(0.001) # Meters (0.1 millimeter) numfolds = 0 while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness print numfolds, "folds required" print "final thickness is", thickness, "meters" 157
  150. Copyright (C) 2008, http://www.dabeaz.com 1- A Sample Program • Output

    % python sears.py 1 0.0002 2 0.0004 3 0.0008 4 0.0016 5 0.0032 ... 20 104.8576 21 209.7152 22 419.4304 23 838.8608 23 folds required final thickness is 838.8608 meters 158
  151. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Statements •

    A Python program is a sequence of statements • Each statement is terminated by a newline • Statements are executed one after the other until you reach the end of the file. • When there are no more statements, the program stops 159
  152. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Comments •

    Comments are denoted by # # This is a comment height = 442 # Meters 160 • Extend to the end of the line • There are no block comments in Python (e.g., /* ... */).
  153. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101: Variables • A

    variable is just a name for some value • Variable names follow same rules as C [A-Za-z_][A-Za-z0-9_]* • You do not declare types (int, float, etc.) height = 442 # An integer height = 442.0 # Floating point height = "Really tall" # A string • Differs from C++/Java where variables have a fixed type that must be declared. 161
  154. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101: Keywords • Python

    has a basic set of language keywords • These are mostly C-like and have the same meaning in most cases • Variables can not have one of these names 162 and assert break class continue def del elif else except exec finally for from global if import in is lambda not or pass print raise return try while yield
  155. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101: Looping • The

    while statement executes a loop • Executes the indented statements underneath while the condition is true 163 while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness
  156. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Indentation •

    Indentation used to denote blocks of code • Indentation musy be consistent while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness (ok) (error) • Colon (:) always indicates start of new block while thickness <= height: 164
  157. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Conditionals •

    If-else if a < b: print "Computer says no" else: print "Computer says yes" • If-elif-else if a == '+': op = PLUS elif a == '-': op = MINUS elif a == '*': op = TIMES else: op = UNKNOWN 165
  158. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Relations •

    Relations return boolean values (True, False) >>> 3 < 4 True >>> 3 > 4 False >>> 166 • Relational operators < > <= >= == !=
  159. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Booleans •

    Boolean expressions (and, or, not) if b >= a and b <= c: print "b is between a and c" if not (b < a or b > c): print "b is still between a and c" • Non-zero numbers, non-empty objects also evaluate as True. 167 x = 42 if x: # x is nonzero else: # x is zero
  160. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : Printing •

    The print statement print x print x,y,z print "Your name is", name print x, # Omits newline • Produces a single line of text • Items are separated by spaces • Works with any kind of Python object • Very useful for debugging 168
  161. Copyright (C) 2008, http://www.dabeaz.com 1- Python 101 : pass statement

    • Sometimes you will need to specify an empty block of code if name in namelist: # Do something else: pass # Not implemented yet 169 • pass is a "no-op" statement • It does nothing, but serves as a placeholder for statements (possibly to be added later)
  162. Copyright (C) 2008, http://www.dabeaz.com 1- Other Formatting Notes • Can

    put single-line bodies on same line for i in range(10): print i • Multiple statements on the same line (;) x = 4; y = 10; z = "hello" • Line continuation (\) if product=="game" and type=="pirate memory" \ and age >= 4 and age <= 8: print "I'll take it!" 170 • Line continuation not needed for (),[],{} if (product=="game" and type=="pirate memory" and age >= 4 and age <= 8): print "I'll take it!"
  163. Copyright (C) 2007, http://www.dabeaz.com 1- Basic Datatypes • Python only

    has a few primitive types of data • Numbers • Strings (character text) 171
  164. Copyright (C) 2007, http://www.dabeaz.com 1- Numbers • Python has 5

    basic kinds of numbers • Booleans • Integers • Long integers • Floating point • Complex (imaginary numbers) 172
  165. Copyright (C) 2007, http://www.dabeaz.com 1- Booleans (bool) • Two values:

    True, False a = True b = False • Evaluated as integers with value 1,0 c = 4 + True # c = 5 d = False if d == 0: print "d is False" • A relatively late addition to Python (v2.3) 173
  166. Copyright (C) 2007, http://www.dabeaz.com 1- Integers (int) • Signed integers

    up to machine precision a = 37 b = -299392993 c = 0x7fa8 # Hexadecimal d = 0253 # Octal • Typically 32 bits • Comparable to the C long type 174
  167. Copyright (C) 2007, http://www.dabeaz.com 1- Long Integers (long) • Arbitrary

    precision integers a = 37L b = -126477288399477266376467L • Integers that overflow promote to longs >>> 3 ** 73 67585198634817523235520443624317923L >>> a = 72883988882883812 >>> a 72883988882883812L >>> • Can almost always be used interchangeably with integers 175
  168. Copyright (C) 2007, http://www.dabeaz.com 1- Integer Operations + Add -

    Subtract * Multiply / Divide // Floor divide % Modulo ** Power << Bit shift left >> Bit shift right & Bit-wise AND | Bit-wise OR ^ Bit-wise XOR ~ Bit-wise NOT abs(x) Absolute value pow(x,y[,z]) Power with optional modulo (x**y)%z divmod(x,y) Division with remainder 176
  169. Copyright (C) 2007, http://www.dabeaz.com 1- Integer Division • Classic division

    (/) - truncates >>> 5/4 1 >>> • Floor division (//) - truncates (same) >>> 5//4 1 >>> • Future division (/) - Converts to float >>> from __future__ import division >>> 5/4 1.25 • Will change in some future Python version • If truncation is intended, use // 177
  170. Copyright (C) 2007, http://www.dabeaz.com 1- Floating point (float) • Use

    a decimal or exponential notation a = 37.45 b = 4e5 c = -1.345e-10 • Represented as double precision using the native CPU representation (IEEE 754) 17 digits of precision Exponent from -308 to 308 • Same as the C double type 178
  171. Copyright (C) 2007, http://www.dabeaz.com 1- Floating point • Be aware

    that floating point numbers are inexact when representing decimal values. >>> a = 3.4 >>> a 3.3999999999999999 >>> 179 • This is not Python, but the underlying floating point hardware on the CPU.
  172. Copyright (C) 2007, http://www.dabeaz.com 1- Floating Point Operators + Add

    - Subtract * Multiply / Divide % Modulo (remainder) ** Power pow(x,y [,z]) Power modulo (x**y)%z abs(x) Absolute value divmod(x,y) Division with remainder • Additional functions are in the math module import math a = math.sqrt(x) b = math.sin(x) c = math.cos(x) d = math.tan(x) e = math.log(x) 180
  173. Copyright (C) 2007, http://www.dabeaz.com 1- Converting Numbers • Type name

    can be used to convert a = int(x) # Convert x to integer b = long(x) # Convert x to long c = float(x) # Convert x to float • Only work if type conversion makes sense >>> a = "Hello World" >>> int(a) ValueError: invalid literal for int() >>> • Also work with strings containing numbers >>> a = "3.14159" >>> float(a) 3.14159 >>> int("0xff",16) # Optional integer base 255 181
  174. Copyright (C) 2008, http://www.dabeaz.com 1- Strings • Specified using quotes

    a = "Yeah but no but yeah but..." b = 'computer says no' c = ''' Look into my eyes, look into my eyes, the eyes, the eyes, the eyes, not around the eyes, don't look around the eyes, look into my eyes, you're under. ''' • Standard escape sequences work (e.g., '\n') • Triple quotes capture all literal text enclosed 182
  175. Copyright (C) 2007, http://www.dabeaz.com 1- String Escape Codes '\n' Line

    feed '\r' Carriage return '\t' Tab '\xhh' Hexadecimal value '\”' Literal quote '\\' Backslash • In literals, standard escape codes work • Raw strings (don’t interpret escape codes) a = r"\w+\.\w+" # String exactly as specified 183 Leading r
  176. Copyright (C) 2007, http://www.dabeaz.com 1- String Representation • An ordered

    sequence of bytes (characters) 184 • Store 8-bit data (ASCII) • May contain binary data, embedded nulls • Strings are frequently used for both text and for raw-data of any kind
  177. Copyright (C) 2007, http://www.dabeaz.com 1- String Representation • Indexed array

    of characters : s[n] a = "Hello world" b = a[4] # b = 'o' c = a[-1] # c = 'd' (Taken from end of string) • Slicing/substrings : s[start:end] d = a[:5] # d = "Hello" e = a[6:] # e = "world" f = a[3:8] # f = "lo wo" g = a[-5:] # g = "world" • Concatenation (+) a = "Hello" + "World" b = "Say " + a 185
  178. Copyright (C) 2007, http://www.dabeaz.com 1- More String Operations • Length

    (len) >>> s = "Hello" >>> len(s) 5 >>> • Membership test (in) >>> 'e' in s True >>> 'x' in s False >>> "ello" in s True 186 • Replication (s*n) >>> s = "Hello" >>> s*5 'HelloHelloHelloHelloHello' >>>
  179. Copyright (C) 2007, http://www.dabeaz.com 1- String Methods • Stripping any

    leading/trailing whitespace t = s.strip() • Case conversion t = s.lower() t = s.upper() • Replacing text t = s.replace("Hello","Hallo") 187 • Strings have "methods" that perform various operations with the string data.
  180. Copyright (C) 2007, http://www.dabeaz.com 1- More String Methods s.endswith(suffix) #

    Check if string ends with suffix s.find(t) # First occurrence of t in s s.index(t) # First occurrence of t in s s.isalpha() # Check if characters are alphabetic s.isdigit() # Check if characters are numeric s.islower() # Check if characters are lower-case s.isupper() # Check if characters are upper-case s.join(slist) # Joins lists using s as delimeter s.lower() # Convert to lower case s.replace(old,new) # Replace text s.rfind(t) # Search for t from end of string s.rindex(t) # Search for t from end of string s.split([delim]) # Split string into list of substrings s.startswith(prefix) # Check if string starts with prefix s.strip() # Strip leading/trailing space s.upper() # Convert to upper case 188
  181. Copyright (C) 2007, http://www.dabeaz.com 1- String Mutability • Strings are

    "immutable" • Once created, the value can't be changed >>> s = "Hello World" >>> s[1] = 'a' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment >>> 189 • All operations and methods that manipulate string data always create new strings
  182. Copyright (C) 2007, http://www.dabeaz.com 1- String Conversions • To convert

    any object to string • Produces the same text as print s = str(obj) • Actually, print uses str() for output >>> x = [1,2,3,4] >>> str(x) '[1, 2, 3, 4]' >>> 190
  183. Copyright (C) 2007, http://www.dabeaz.com 1- String Splitting • Strings are

    often split into a list of strings >>> line = 'GOOG 100 490.10' >>> fields = line.split() >>> fields ['GOOG', '100', '490.10'] >>> 191 • Example: When reading data from a file, you might read each line and then split the line into columns or fields.
  184. Copyright (C) 2007, http://www.dabeaz.com 1- Lists • A sequence of

    arbitrary objects (an array) names = [ "Elwood", "Jake", "Curtis" ] nums = [ 39, 38, 42, 65, 111] • Can contain mixed types items = [ "Elwood", 39, 1.5 ] • Adding new items items.append("that") # Adds at end items.insert(2,"this") # Inserts in middle 192 • Concatenation : s + t s = [1,2,3] t = ['a','b'] s + t [1,2,3,'a','b']
  185. Copyright (C) 2007, http://www.dabeaz.com 1- Lists (cont) • Negative indices

    are from the end names[-1] "Curtis" 193 • Lists are indexed by integers (starting at 0) names = [ "Elwood", "Jake", "Curtis" ] names[0] "Elwood" names[1] "Jake" names[2] "Curtis" • Changing one of the items names[1] = "Joliet Jake"
  186. Copyright (C) 2007, http://www.dabeaz.com 1- More List Operations • Length

    (len) >>> s = ['Elwood','Jake','Curtis'] >>> len(s) 3 >>> • Membership test (in) >>> 'Elwood' in s True >>> 'Britney' in s False >>> 194 • Replication (s*n) >>> s = [1,2,3] >>> s*3 [1,2,3,1,2,3,1,2,3] >>>
  187. Copyright (C) 2007, http://www.dabeaz.com 1- Lists (Removal) • Removing an

    item names.remove("Curtis") del names[2] • Deleting an item by index 195 • Removal results in items moving down to fill the space vacated (i.e., no "holes").
  188. Copyright (C) 2007, http://www.dabeaz.com 1- File Input and Output •

    Opening a file f = open("foo.txt","r") # Open for reading g = open("bar.txt","w") # Open for writing • To read a line of text line = f.readline() • To write text to a file g.write(text) • To print to a file print >>g, "Your name is", name 196
  189. Copyright (C) 2007, http://www.dabeaz.com 1- Looping over a file •

    Reading a file line by line f = open("foo.txt","r") for line in f: # Process the line ... f.close() • Alternatively for line in open("foo.txt","r"): # Process the line ... • This reads all lines until you reach the end of the file 197
  190. Copyright (C) 2007, http://www.dabeaz.com 1- Simple Functions • Use functions

    for code you want to reuse def square(x): return x*x • Calling a function a = square(3) 198 • A function is just a series of statements that return a result or carry out some task
  191. Copyright (C) 2007, http://www.dabeaz.com 1- Library Functions • Python comes

    with a large standard library • Library modules accessed using import import math x = math.sqrt(10) import urllib u = urllib.urlopen("http://www.python.org/index.html") data = u.read() 199 • Will cover in more detail later
  192. Copyright (C) 2008, http://www.dabeaz.com 1- dir() function • dir() returns

    list of symbols >>> import sys >>> dir(sys) ['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__', '__stdin__', '__stdout__', '_current_frames', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'exc_clear', 'exc_info', 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'getcheckinterval', ... 'version_info', 'warnoptions'] • Useful for exploring, inspecting objects, etc. 200
  193. Copyright (C) 2007, http://www.dabeaz.com 1- Exception Handling • Errors are

    reported as exceptions • Cause the program to stop >>> f = open("file.dat","r") Traceback (most recent call last): File "<stdin>", line 1, in <module> IOError: [Errno 2] No such file or directory: 'file.dat' >>> 201 • For debugging, message describes what happened, where the error occurred, along with a traceback.
  194. Copyright (C) 2008, http://www.dabeaz.com 1- Exceptions • To catch, use

    try-except try: f = open(filename,"r") except IOError: print "Could not open", filename • To raise an exception, use raise raise RuntimeError("What a kerfuffle") 202 • Exceptions can be caught
  195. Copyright (C) 2008, http://www.dabeaz.com 1- Summary • This has been

    an overview of simple Python • Enough to write basic programs • Python code tends to be fairly readable • Just have to know the core datatypes and a few basics (loops, conditions, etc.) 203
  196. Copyright (C) 2008, http://www.dabeaz.com 2- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  197. Copyright (C) 2008, http://www.dabeaz.com 2- Working with Data and Data

    Structures Section 2 2 "The horror! The horror!" - Col. Kurtz
  198. Copyright (C) 2008, http://www.dabeaz.com 2- Introduction 3 • Most programs

    need to perform various kinds of data manipulation • Mathematical calculations • Text processing • Row/column oriented data (e.g., databases)
  199. Copyright (C) 2008, http://www.dabeaz.com 2- Overview 4 • In this

    section, we take a closer look at how dynamic languages handle data • Topics include: • Variables and values • Primitive data types • Operations on data • Compound data • Memory management
  200. Copyright (C) 2008, http://www.dabeaz.com 2- Disclaimer 5 • We're going

    to cover some topics that you normally do not find in the "user manual" for various languages. • My goal is to explore the design challenges and decisions that have been made in various languages. • The big picture
  201. Copyright (C) 2008, http://www.dabeaz.com 2- Variables 7 • To work

    with data, programs typically assign values to "variables" • A variable has a name which is known as an "identifier" • The identifier is used to identify values in subsequent calculations • The value is some sort of data
  202. Copyright (C) 2008, http://www.dabeaz.com 2- Static Typing double total; int

    x; 8 • In static languages such as C, C++, and Java, all variables must be declared and given a specific type in advance (declarations) • Underneath the covers, this binds the variable name to a fixed memory location that holds the value of the variable. • Type and location remain fixed
  203. Copyright (C) 2008, http://www.dabeaz.com 2- Dynamic Typing total = 0.0

    x = 42 9 • In dynamic languages, variables are just names for values • As the program runs, the value may change. • And it may change to a completely different type of data x = "foo" • Underneath the covers, it's just a table
  204. Copyright (C) 2008, http://www.dabeaz.com 2- Variable Tables a = 0.0

    b = 42 c = "Hello World" 10 'a' 'b' 'c' 0.0 42 "Hello World" Variable table • As your program runs, this table gets dynamically updated as variables are created, values get changed, and variables are destroyed.
  205. Copyright (C) 2008, http://www.dabeaz.com 2- Values 11 • A "value"

    represents some kind of data • Usually falls into a couple of categories • Primitive data (numbers and strings) • Compound data (arrays) • Objects • The treatment of values is actually a fairly complex problem (more soon)
  206. Copyright (C) 2008, http://www.dabeaz.com 2- Typeless Languages 12 • In

    some languages, all values are the same • For example, in shell scripts and Tcl, all values are just text strings set a 0.0 set b 42 set c "Hello World" 'a' "0.0" 'b' "42" 'c' "Hello World" • Because there are no types, programs simply interpret the value strings in different ways set c "$a + $b" # c -> "0.0 + 42" set c [expr "$a + $b"] # c -> "42.0"
  207. Copyright (C) 2008, http://www.dabeaz.com 2- Typed Languages 13 • Most

    dynamic languages use typed values a = 0.0 b = 42 c = "Hello World" 'a' (float, 0.0) 'b' (int, 42) 'c' (str,"Hello World") • Various operations in the language then look at the types to figure out what to do x = a + b # Ok. x = 42.0 y = b + c # Error. Can't add int and str • However, there is great variation in how "strict" a language is when types are mixed.
  208. Copyright (C) 2008, http://www.dabeaz.com 2- Strong Typing 14 • If

    a language is strongly typed, it tends to enforce strict rules about how values are used # Python a = 42 # An integer b = "Hello World" # A string x = a + b # Error • Any operation involving incompatible types may result in some kind of "Type Error"
  209. Copyright (C) 2008, http://www.dabeaz.com 2- Weak Typing 15 • A

    language may also be "weakly" typed. • In this case, the language performs implicit conversions to make certain operation go ahead. • For example, implicitly treating numbers as strings (shown above). // Javascript var a = 42 // An integer var b = "Hello World" // A string var x = a + b // x = "42Hello World"
  210. Copyright (C) 2008, http://www.dabeaz.com 2- Terminology 16 • "Strong" versus

    "weak" typing • Generally this just refers to whether or not a programming language makes a lot of implicit type conversions. a = 42 b = "Hello World" a + b Error # Strong typing a + b "42Hello World" # Weak typing • For example, even though C is statically typed, it is considered to be weakly typed
  211. Copyright (C) 2008, http://www.dabeaz.com 2- Type Safety 17 • A

    related issue that pertains to whether or not a language lets you "cast" values between incompatible data types. • Example : Pointer casting in C Foo *f; int x; x = (int) f; // OK. • This was one big difference between C/ Pascal (C let you do anything you wanted)
  212. Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 18 • In practice,

    programming languages don't always fall neatly into any one category • Certain parts of the language may appear to be strongly typed whereas other parts seem to be weakly typed. x = 42 # int y = 2.5 # float z = x + y # float (x implicitly converted to int) • If it's too strict, it's "safe" but quite fussy
  213. Copyright (C) 2008, http://www.dabeaz.com 2- Numeric Data 20 • Numbers

    are obviously one of the most common primitive data types • There are two basic kinds of numbers: • Integers : 123, -45, 1234 • Reals : 1.23, 4.5, 12e+34 • However, working with numbers is often a surprisingly difficult problem • Let's talk more about this....
  214. Copyright (C) 2008, http://www.dabeaz.com 2- Math on the CPU 21

    • The CPU of your computer supports math with a few primitive types • Integer word (32 or 64 bits) • Floating point (32 or 64 bits) • In static languages (C, Java), these map to very specific datatypes int # 32 bit integer long # 32 or 64 bit integer (depends) float # 32-bit floating point number double # 64-bit floating point
  215. Copyright (C) 2008, http://www.dabeaz.com 2- Native Integer Math 22 •

    On the CPU, integers are a bunch of bits 5 00000000000000000000000000000101 -5 11111111111111111111111111111011 00000000110101111110101000010001 sign bit "value" • Data representation is in 2's complement • Invert all bits and add 1 to go between +/- 32 bits
  216. Copyright (C) 2008, http://www.dabeaz.com 2- Native Integer Math 23 •

    Range of integers (32 bits) 10000000000000000000000000000000 01111111111111111111111111111111 -2147483648 2147483647 00000000000000000000000000000000 0 • Commentary : The representation of numbers is a surprisingly complex problem (there are many ways to do it). Would cover in more detail in an computer architecture coure.
  217. Copyright (C) 2008, http://www.dabeaz.com 2- Native Integer Math 24 •

    Native integers support the usual math operations (+, -, *, /) • Also, a number of "bitwise" operators 10110101 bitwise-or 00101100 10111101 | 10110101 bitwise-and 00101100 00100100 & 10110101 bitwise-xor 00101100 10011001 ^ 10110101 left shift 1 01101010 << 10110101 right shift 11011010 >> 1 00101100 invert 11010011 ~
  218. Copyright (C) 2008, http://www.dabeaz.com 2- Integer Overflow 25 • On

    CPU, math operations that exceed the hardware range will overflow 01001001100101100000001011010010 1234567890 10010011001011000000010110100100 -1825831516 * 2 Result overflows into the sign bit • C/C++ is completely silent when this happens (i.e., you don't get an error).
  219. Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 26 • The behavior

    of integer math on the CPU is fairly well understood by C/C++ programmers (maybe) • Math operations in those languages are directly mapped to low-level machine instructions (C as a better assembly) • Truncation, overflow, and other aspects are just "features" of those languages.
  220. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Numbers 27 •

    Floating point numbers are a representation of the real numbers (decimals) • A number consists of three parts -1.23647223 x 1034 sign (+,-) mantissa exponent • "Floating Point" refers to the fact that the position of the decimal point varies
  221. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Numbers 28 •

    Fixed point numbers 1.23456 0.12345 0.01234 0.00123 0.00012 • Floating point numbers 1.23456 x 100 1.23456 x 10-1 1.23456 x 10-2 1.23456 x 10-3 1.23456 x 10-4 The exponent adjusts the position of the decimal point
  222. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Numbers 29 •

    On hardware, floating point numbers are merely a different interpretation of the bits 00000000110101111110101000010001 sign bit 32 bits exponent mantissa 8 bits 23 bits 32 bit float • Value is computed as (+/-) mantissa * 2exponent
  223. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Numbers 30 •

    There are two main types of floats • Described by a standard : IEEE 754 • Single precision (32-bit) sign bit exponent mantissa 8 bits 23 bits • Double precision (64-bit) sign bit exponent mantissa 11 bits 52 bits
  224. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Numbers 31 •

    Numerical range of floating point • Single precision • Double precision 8 digits of accuracy Max value : 3.4 x 1038 17 digits of accuracy Max value : 1.8 x 10308 • Given a choice, most people use double
  225. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Math 32 •

    The CPU has a floating point unit to perform math operations (+,-,*,/, sqrt, sin, cos, tan, etc). • One caution : Floating point can not accurately represent decimals (all values are approximate). >>> x = 3.4 >>> x 3.39999999999999 >>> >>> x = 0.1 * 0.1 >>> print x 0.01 >>> x == 0.01 False >>>
  226. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Math 33 •

    Because floating point is approximate, repeated calculations result in mathematical errors that accumulate in a program. • Normally, this is covered in a numerical analysis/numerical methods class D. Goldberg, "What Every Computer Scientist Should Know About Floating Point Arithmetic" • One reason why floating point is sometimes avoided in business software
  227. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Errors 34 •

    Certain operations result in exceptions (divide by 0, overflow, sqrt(-1)) • There are three special values +Inf Positive infinity -Inf Minus infinity NaN Not a number • These get encoded in a special way in the number (exponent field set to all 1s).
  228. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Errors 35 •

    Design issue : If a math calculation produces an exceptional value (+Inf, -Inf, NaN), should it cause a program to abort or should the program keep running? • Note : These special values are "sticky". Any operation involving +Inf,-Inf, NaN will only produce one of those values as a result (it will not ever turn back into a normal number)
  229. Copyright (C) 2008, http://www.dabeaz.com 2- Floating Point Errors 36 •

    If you ignore errors, a program may run for a very long time silently producing garbage data (NaNs, Inf, etc.) • If you cause an abort, an unexpected math error (e.g., due to some kind of transient event) might cause the whole program to mysteriously crash.
  230. Copyright (C) 2008, http://www.dabeaz.com 2- Interlude 37 • Math is

    tricky • Classic example : Arianne 5 Rocket Launch (1996) • Exploded 37 seconds after launch • Cause : Overflow in a float to integer conversion produced an uncaught math exception (which then caused the guidance software to dump core).
  231. Copyright (C) 2008, http://www.dabeaz.com 2- Math in Dynamic Languages 40

    • Dynamic languages tend to be very high level x = 12345 # An integer y = 123.45 # A float • Design question : Should a high-level language force programmers to think about low-level implementation details regarding math? (e.g., bits, overflow, etc.)
  232. Copyright (C) 2008, http://www.dabeaz.com 2- Integers 41 • It is

    common to represent integers by mapping them to the native integer type • This is beneficial for performance • If so, integers will have a fixed range: • -2147483648 -> 2147483647 (32 bits) • Question : what happens if you go outside that range?
  233. Copyright (C) 2008, http://www.dabeaz.com 2- Integer Overflow 42 • Easy

    solution : Do nothing. Just let math operations overflow like they do in C. • Example: Tcl set x 1234567890 set y 1234567890 set z [expr $x + $y] puts $z # Outputs -1825831516 • It's all perfectly intuitive if you're a C programmer (in fact, you'd ideally write your program to depend on this "feature" in some sort of very crucial, but diabolical way)
  234. Copyright (C) 2008, http://www.dabeaz.com 2- Digression : Story of Mel

    43 "Perhaps my greatest shock came when I found an innocent loop that had no test in it. No Test. None. Common sense said it had to be a closed loop, where the program would circle, forever, endlessly. Program control passed right through it, however, and safely out the other side. It took me two weeks to figure it out." "The vital clue came when I noticed ... incrementing the instruction address would make it overflow..." http://www.cs.utah.edu/~elb/folklore/mel.html
  235. Copyright (C) 2008, http://www.dabeaz.com 2- Promotion to Floats 44 •

    Another strategy : Silently promote integers that overflow to floating point numbers • Example: PHP <?php $x = 1234567890; $y = 1234667890; $z = $x + $y; print(gettype($x) . "\n"); // prints "integer" print(gettype($z) . "\n"); // prints "double" ?>
  236. Copyright (C) 2008, http://www.dabeaz.com 2- Promotion to Bignums 45 •

    Another strategy : Promote integers to arbitrary precision longs/bignums • Example: Ruby/Python x = 1234567890 # int y = 1234567890 # int z = x + y # long print type(x) # produces <type 'int'> print type(z) # produces <type 'long'> • In this case, integers are allowed to grow to arbitrary size.
  237. Copyright (C) 2008, http://www.dabeaz.com 2- Bignum Implementation 46 • Typically,

    a bignum is a sequence of native integers chained together. 32 bits 32 bits 32 bits ... • The number of parts is allowed to grow or shrink dynamically to accommodate the number as necessary bignum
  238. Copyright (C) 2008, http://www.dabeaz.com 2- Bignums for Everything? 47 •

    Why not just store all integers using the big number format? • Calculations will be slower due to extra processing overhead • Big numbers take more memory • Since small integer values are the most common, it makes little sense to penalize them.
  239. Copyright (C) 2008, http://www.dabeaz.com 2- Integers as Floats 48 •

    Some languages just always represent integers using double precision floating point numbers • Example : Perl, Javascript $x = 1234567890; # float $y = 1234567890; # float $z = $x + $y; # float • In this case, you just dispense with the problem of having to promote values (all numbers are the same type)
  240. Copyright (C) 2008, http://www.dabeaz.com 2- Integers as Floats 49 •

    If you use floats, you'll get an extended range of exact integer values. (53 bits). 0 9007199254740992.0 (9.0e+15) • If you go beyond this, things get "weird" 9007199254740992.0 + 1 9007199254740992.0 (same) 9007199254740992.0 + 2 9007199254740994.0 9007199254740992.0 + 3 9007199254740996.0 • Will start to get "gaps" between numbers
  241. Copyright (C) 2008, http://www.dabeaz.com 2- Integers as Floats 50 •

    There are some downsides • Floating point math is slower than integer math on the hardware. However, maybe you don't care in an interpreted language. • Increased memory footprint. 64-bit floats take twice as much memory as 32-bit ints. • May find special cases at/around 32 bit limit. For example, systems interfaces may only work with 32 bit integer values
  242. Copyright (C) 2008, http://www.dabeaz.com 2- Example : Bitwise Ops 51

    • Sometimes silently truncated at 32-bits x = 9876543210 a = x * 2; # Multiplies x by 2 b = x << 1; # Multiplies x by 2 a 19753086420 (Perl) b 4294967294 a 19753086420 (Python/Ruby) b 19753086420 a 19753086420 (PHP) b -2 a 19753086420 (Javascript) b -1721750060
  243. Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 52 • One of

    the reasons why Python/Ruby use integer bignums is to provide mathematical consistency across the entire range of integer values • There is no mysterious 32-bit cut-off for some operations and not others
  244. Copyright (C) 2008, http://www.dabeaz.com 2- Integer Use Cases 53 •

    There are actually very few practical applications that need the accuracy of really big integer numbers (e.g., cryptography) • Most uses of integers are for counting and for indexing into data (e.g., array lookup). • Example : Indexing bytes in a file 1 Gigabyte 1073741824.0 1 Terabyte 1099511627776.0 1 Petabyte 1125899906842624.0 Largest int in a 64-bit float 9007199254740992.0 • For now, using floats is fine.
  245. Copyright (C) 2008, http://www.dabeaz.com 2- Integer Division -7/3 -2.333333333333333 (Perl,Javascript,PHP)

    -7/3 -3 (Python, Ruby, Tcl) -7/3 -2 (C, C++, Java) 54 • Integer division behaves differently in different languages (a surprise!) • Choices: • Convert to floating value (exact value) • Floor division (closest integer less than the value) • Truncate towards zero.
  246. Copyright (C) 2008, http://www.dabeaz.com 2- Integer Division 55 • Currently,

    the trend in dynamic languages may be to make integer division convert the result to a floating point number if result not exact • Python is changing integer division in v. 3.0 • This change is highly controversial (I was even skeptical when I first heard it).
  247. Copyright (C) 2008, http://www.dabeaz.com 2- Why Worry? 56 • In

    compiled languages, you can write functions that expect to work with floats, but you can use them fine with integers float midpoint(float x, float y) { return (x+y)/2; } ... float m = midpoint(12,17); // m = 14.5 • Inside the compiler, it knows the arguments are supposed to be floats. So, it automatically converts the integer arguments to floats. • It all just works.
  248. Copyright (C) 2008, http://www.dabeaz.com 2- Why Worry? 57 • In

    dynamic languages, functions are written with no type information def midpoint(x, y): return (x+y)/2 ... m = midpoint(12,17) // m = 14 m = midpoint(12.0,17.0) // m = 14.5 • It is very easy to silently introduce numerical errors into a program. • You can code around it, but it is error prone, makes code hard to read, and runs slower.
  249. Copyright (C) 2008, http://www.dabeaz.com 2- Why it's controversial 58 •

    There are important uses of truncating integer division • Most common : Date/time calculations seconds = x; minutes = seconds / 60 hours = minutes / 60 days = hours / 24 • A good subject for flame wars involving people with far too much spare time
  250. Copyright (C) 2008, http://www.dabeaz.com 2- Why not Rationals? 59 •

    Some programming languages convert integer division into exact rational numbers (fractions) • Example: Common lisp > (setf seconds 123456) 123456 > (setf minutes (/ seconds 60)) 10288/5 > (setf hours (/ minutes 60)) 2572/75 > (setf days (/ hours 24)) 643/450 > • Mathematically interesting, but not practical for most people
  251. Copyright (C) 2008, http://www.dabeaz.com 2- Numbers and Strings 60 •

    Some languages blur the distinction between numbers and text strings x = "42 bottles" y = "37 bottles" $x + $y 79 (Perl, PHP) x + y "42 bottles37 bottles" (Python, Ruby) • If a string is used in a context that expects a number, it may be converted to a number
  252. Copyright (C) 2008, http://www.dabeaz.com 2- Numbers and Strings 61 •

    In some cases, it's a little diabolical // Javascript var x = "42" var y = "37" var a = x + y; // a = "4237" var b = x * y; // b = 1554 • If numbers and strings are mixed, it's more common to have separate string/math ops # Perl/PHP $x = "42" $y = "37" $a = $x + $y; # Numeric add : a = 79 $b = $x . $y; # String concat : a = "4237"
  253. Copyright (C) 2008, http://www.dabeaz.com 2- Mixed Numbers/Strings 62 • Underneath

    the covers, an interpreter may keep multiple representations of data $x = "1 bottle"; str : "1 bottle" num : 1 • If used as a number, the numeric value will be saved and reused in later calculations • Perl does this.
  254. Copyright (C) 2008, http://www.dabeaz.com 2- Numeric Type Promotion 63 •

    Many languages have multiple numeric types int : 42 long : 1273894812883991923 float : 1.2374623 complex : 1.23 + 4.5j • In calculations involved mixed types, numbers are converted to the same type. • Usually done in a way so accuracy is not lost 42 + 4.5 42.0 + 4.5 • You still need to be careful
  255. Copyright (C) 2008, http://www.dabeaz.com 2- Alternatives to Floats 64 •

    Generalized Decimal Arithmetic • Example : Python Decimal() types >>> import decimal >>> a = decimal.Decimal("3.45") >>> b = decimal.Decimal("7.22") >>> a + b Decimal("10.67") >>> a / b Decimal("0.4778393351800554016620498615") • This module performs exact decimal ops • IBM General Decimal Arithmetic Spec.
  256. Copyright (C) 2008, http://www.dabeaz.com 2- Interlude 65 • As you

    can tell, numbers are a bit of a mess • Not entirely standardized across languages • Many possibilities for program errors (especially with weak typing) • Many tradeoffs and design considerations • Let's move on to something more "simple"...
  257. Copyright (C) 2008, http://www.dabeaz.com 2- Part 3 66 The equally

    secret, but quite different, life of text
  258. Copyright (C) 2008, http://www.dabeaz.com 2- Strings 67 • A string

    typically refers to a sequence of characters x = "Hello World" • Most programmers are generally familiar with working with strings. • Operations on strings are fairly well "standardized" across languages.
  259. Copyright (C) 2008, http://www.dabeaz.com 2- String Literals 68 • There

    are a number of quoting conventions a = "Hello World" b = 'Hello World' c = """This is a multiline string. It captures all text.""" • "Heredoc" assignment (Perl, PHP, Ruby) a = <<END All of the text from here on is captured just as is is typed. END
  260. Copyright (C) 2008, http://www.dabeaz.com 2- String Interpolation 69 • You

    can sometimes substitute variables name = "Dave" text = "Your name is $name" # Perl/PHP/Tcl text = "Your name is ${name}" # Alternative text = "Your name is #{name}" # Ruby • This is notably absent in Python/Javascript (although you can sometimes hack it) name = "Dave" text = "Your name is %(name)s" % vars() # Python
  261. Copyright (C) 2008, http://www.dabeaz.com 2- String Operations 70 • There

    are common string operations • stripping/chopping " text \n" "text" • splitting "text1 text2 text3" [ "text1", "text2", "text3" ] • replacing "Hello World" "Hello There" • Just read a manual to find out how
  262. Copyright (C) 2008, http://www.dabeaz.com 2- Strings are Hard 71 •

    There are a number of "hard" real-world issues concerning the use, design, and implementation of strings. • These issues are often overlooked/ignored by programmers (at their own peril) • We're not one of those programmers
  263. Copyright (C) 2008, http://www.dabeaz.com 2- Problem 72 • A string

    is a sequence of characters x = "Hello World" • Yes, yes, please continue... • Question : What is a character? • Think about it for a moment...
  264. Copyright (C) 2008, http://www.dabeaz.com 2- Characters 73 • A character

    is a number • 65 'A' (ASCII) • The number is a symbolic representation of some sort of a writing element (e.g., a letter) • The number 65 represents the letter 'A' • A character is not the visual presentation (that's called a "glyph"). • Oh, and a character is not a byte
  265. Copyright (C) 2008, http://www.dabeaz.com 2- The Character Song 74 •

    Characters are not bytes • Characters are not bytes • Characters are not bytes • Characters are not bytes • Characters are not bytes
  266. Copyright (C) 2008, http://www.dabeaz.com 2- Wait a Minute! 75 •

    Characters are bytes and always have been! • Characters 0-127 : ASCII • Characters 128-255 : Everything else • Just look it up in the manual... you'll see. "A string is an array of bytes." - From Ruby in a Nutshell • Yes, it's common for characters to be bytes
  267. Copyright (C) 2008, http://www.dabeaz.com 2- History Lesson 76 • Much

    of early computing was invented in the west (US/Europe) • Strong bias towards European languages and European characters • Which, conveniently, happened to mostly fit into a single byte of data. • Example : ASCII character set • But there was other horrible weirdness
  268. Copyright (C) 2008, http://www.dabeaz.com 2- Early Home Computers 77 •

    In the late 70s, every manufacturer implemented ASCII, but did whatever they wanted with the rest of the characters Starship Enterprise! Incriminating drink stain "\x09\x0a"
  269. Copyright (C) 2008, http://www.dabeaz.com 2- Example: Microsoft CP437 78 •

    Charset for IBM- PC (1981) • Taken from Wang word-processing machines • A hodge-podge of letters, nums, accented letters, and symbols
  270. Copyright (C) 2008, http://www.dabeaz.com 2- Characters as Bytes 80 •

    The practice of storing characters in a single byte is not workable in general • Thousands of different world languages • Some have thousands of characters (e.g., Chinese, Japanese, Korean, etc.) • Is everyone going to go create their own mutually incompatible character set? (well, yes, actually).
  271. Copyright (C) 2008, http://www.dabeaz.com 2- Example: Big5 81 • An

    encoding of Chinese that emerged out of using MS-DOS in Taiwan (early 80s) CP437 (Multibyte characters sort of "hacked" to overlay with CP437)
  272. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode 82 • A standardized

    mapping of characters to numerical codes • First appeared ~1991 and is periodically updated by a consortium • Simple explanation : Assign a unique number to all characters used by humans in all written languages. • (Sounds like a good project for a hapless Ph.D. student).
  273. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode 83 • Characters are

    organized into code charts (http://www.unicode.org/charts)
  274. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Big Picture 86 •

    There are currently about 100000 characters • Characters 0-127 correspond to ASCII • Other character sets are mapped to different ranges of numbers (usually given in hex) • Example: Armenian (0530-058F) • Example: Mongolian (1800-18AF)
  275. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Big Picture 87 •

    Unicode just assigns numbers • The unicode standard does NOT specify how characters are supposed to be represented in memory • Does NOT specify how characters are supposed to be stored in files • And it's not even entirely consistent on how you represent certain characters
  276. Copyright (C) 2008, http://www.dabeaz.com 2- What is a Character? 88

    • In theory, there is a unique integer code for each character. • However, in some languages, there are characters and then there are characters with modifiers (e.g.,ä, ã, â, á, à, å). • Unicode gives all of these variants a separate numerical code. ä U+00e4 ã U+00e3 â U+00e2 á U+00e1 à U+00e0
  277. Copyright (C) 2008, http://www.dabeaz.com 2- What is a Character? 89

    • But certain characters can also be constructed by adding modifiers • ä = a + ̈ (0061 + 0308) • So, you might have multiple representations "Jalapeño" 004a 0061 007c 0061 0070 0065 00f1 006f 004a 0061 007c 0061 0070 0065 006e 0303 006f ñ n ̃
  278. Copyright (C) 2008, http://www.dabeaz.com 2- String Comparison 90 • If

    the same text has multiple representations, how do you do string comparison? • Well, in general you don't • To do this, you would have to "normalize" strings to one standard representation • A related, but equally nasty problem : Alphabetization (collocation)
  279. Copyright (C) 2008, http://www.dabeaz.com 2- Character Collocation 91 • From

    a French-English Dictionary • Collocated characters e U+0065 è U+00E8 é U+00E9
  280. Copyright (C) 2008, http://www.dabeaz.com 2- Character Collocation 92 • The

    collocation of characters varies by language/region, not by character set • So, to make sorting work, you would have to have some kind of collocation sequence that specifies the desired order [..., c, d, e, è, é, ê, ë, f, g, h, ... • Bloody hell • Let's move on...
  281. Copyright (C) 2008, http://www.dabeaz.com 2- Strings (Revisited) 93 • So,

    a string is just a sequence of "characters" x = "Hello World" • Question : You're the language designer. Are you going to support Unicode? • Well, it's 2008, so let's assume yes...
  282. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Implementation 94 • How

    do you represent Unicode strings? • Unicode characters are just numbers • Well, just store the numbers then...
  283. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Implementation 95 • Option

    1: Make each character a 32-bit int • This is known as UCS-4 • More than enough bits to represent all unicode characters, but it hogs memory • ASCII text takes 4 times as much memory • Memory is cheap--buy more RAM. • Worse performance (e.g., CPU cache)
  284. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Implementation 96 • Option

    2 : Make each character a slightly smaller, but still large enough integer. • For example : 20 bits • Fine except that 20 bits is pretty odd • No C,C++,Java datatype for that. • Not natively supported on the CPU. • Will run slow as hell. • Nobody does this.
  285. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Implementation 97 • Option

    3 : Make each character a 16-bit int • Known as UCS-2 (very common) • Much less memory overhead. • But 16-bits is not enough to represent all of the unicode characters • However, the Unicode people thought of that...
  286. Copyright (C) 2008, http://www.dabeaz.com 2- Surrogate Pairs 98 • Large

    Unicode characters can be encoded into a pair of smaller character codes • U+D800 - U+DFFF (Surrogate pairs) • How it works : U+1D122 ( ) 1D122 00011011000100100010 0001101100 0100100010 1011100001101100 1011100100100010 (U+D86C, U+D922) (20 bits) (2x10 bits) (Add to D800) (A pair of 16 bit values)
  287. Copyright (C) 2008, http://www.dabeaz.com 2- Surrogate Pairs 99 • Some

    unicode characters now get encoded as a pair of "sort of" characters • U+1D122 becomes (U+D86C, U+D922) • How is that supposed to work in practice? • Does an application programmer check? • If surrogate pairs get handled automatically, you are probably working in a string encoding known as UTF-16
  288. Copyright (C) 2008, http://www.dabeaz.com 2- Unicode Implementation 100 • Option

    4. Variable length encoding • Example : UTF-8 • Now, this is something you see a lot <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> • But, what is UTF-8 exactly?
  289. Copyright (C) 2008, http://www.dabeaz.com 2- UTF-8 101 • Characters 0-127

    are ASCII (backwards compatibility) • The rest of the entire Unicode character set is encoded into numerical range 128-255. • However, single characters may require a variable number of bytes to be represented
  290. Copyright (C) 2008, http://www.dabeaz.com 2- UTF-8 102 0 - 127

    0nnnnnnn (0x0) (0x7f) 128 - 2047 110nnnnn 10nnnnnn (0x7f) (0x7ff) 2048 - 65535 1110nnnn 10nnnnnn 10nnnnnn (0x800) - (0xffff) 65536 - 2097152 11110nnn 10nnnnnn 10nnnnnn 10nnnnnn (0x10000)-(0x200000) Bits set here determine how many additional data bytes follow Data bytes (6-bit chunks) ASCII
  291. Copyright (C) 2008, http://www.dabeaz.com 2- UTF-8 103 • UTF-8 has

    some nice properties • Can often be plugged into legacy programs that just process characters as bytes • Problem : Not a good internal format for randomly accessing unicode characters. • Example : Array lookup s = "some unicode string" c = s[n] # Does this return the nth character # or does it return the nth byte?
  292. Copyright (C) 2008, http://www.dabeaz.com 2- Other Stuff that Sucks 104

    • Putting unicode characters in string literals • Source code encoded in Unicode • Read/writing Unicode data from files (later) • Unicode character properties database a = "¼" b = "x" numeric(a) -> 0.25 numeric(b) -> false
  293. Copyright (C) 2008, http://www.dabeaz.com 2- What about Bytes? 105 •

    There is still a need for processing data as raw sequences of bytes • Example : Processing binary formats (images, sound files, video, etc.) • Example : Fast ASCII Text processing • Do you have to wedge this into all of the Unicode processing?
  294. Copyright (C) 2008, http://www.dabeaz.com 2- Byte Strings 106 • One

    solution is to provide an entirely different primitive datatype for "byte strings" • Example : Python-3000 s = b"just some bytes" • This raises new issues : Do you allow text strings and byte strings to intermix? • If so, what rules define that relationship?
  295. Copyright (C) 2008, http://www.dabeaz.com 2- Interlude 107 • All dynamic

    programming languages have been wrestling with the unicode problem right now • It's a complicated issue because these languages have grown entirely out of real- world application development. • Many of the issues are quite subtle. • We haven't even discussed I/O yet!
  296. Copyright (C) 2008, http://www.dabeaz.com 2- True Story 108 • I

    once met some American programmers working on a news web site that published some articles in Spanish. They couldn't figure out how to deal with the special "Spanish" characters so they just dropped them entirely. "That's a spicy Jalapeo" • Don't be like those guys...
  297. Copyright (C) 2008, http://www.dabeaz.com 2- Strings (Reprise) 109 • A

    string is a sequence of characters x = "Hello World" • Yes, conceptually simple, but some horrible details concerning "characters" • But let's assume you've sorted that out. • Question : How do strings actually behave in our favorite dynamic language?
  298. Copyright (C) 2008, http://www.dabeaz.com 2- String Mutability 110 • Question

    : Can you modify the contents of a string after you create it? • Sometimes yes : Perl, PHP, Ruby irb(main):001:0> a = "Hello World" => "Hello World" irb(main):002:0> a[1] = 'a' => "a" irb(main):003:0> a => "Hallo World" • Sometimes no : Python, Javascript >>> a = "Hello World" >>> a[1] = 'a' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment >>>
  299. Copyright (C) 2008, http://www.dabeaz.com 2- Mutable Strings 111 • Allows

    in-place modification of string data • High performance for manipulating huge strings $text =~s/Foo/Bar/g; # Substition in perl • Question : But what happens here? $other = $text; • It's fast because you can modify the contents without making a new copy in memory.
  300. Copyright (C) 2008, http://www.dabeaz.com 2- Mutable Strings 112 • Copy

    on assignment. When saving the value of a string in a new variable, make a fresh copy. • Copy by reference. Just copy a pointer. Of course, this can lead to bizarre sharing. $other = $text; # ruby a = "Hello World" b = a b.sub!("Hello","Hello Cruel") print a # "Hello Cruel World"
  301. Copyright (C) 2008, http://www.dabeaz.com 2- Mutable Strings 113 • Copy

    on write. Initially copy by reference, but if anyone makes a modification, make a local copy a = "Hello World" b = a b[1] = 'a' "Hello World" a b "Hello World" a b "Hallo World" copy • Sounds tricky...
  302. Copyright (C) 2008, http://www.dabeaz.com 2- Mutable Strings 114 • Because

    of sharing, mutable strings may have one set of methods that always return a new string and another set that modify a string "in-place" • Example: Ruby Create new strings In-place ------------------ --------- s.capitalize s.capitalize! s.chomp s.chomp! s.gsub s.gsub! s.strip s.strip! ...
  303. Copyright (C) 2008, http://www.dabeaz.com 2- Dangers 115 • Working with

    mutable strings requires a certain degree of programming discipline • Since values might be shared, changes can unexpectedly affect other parts of code (like working with pointers in C++) • Could get real messy if working with Unicode and multibyte character sets because of internal representation and encoding issues.
  304. Copyright (C) 2008, http://www.dabeaz.com 2- Immutable Strings 116 • Immutable

    strings are much simpler • Since they are read-only, all operations that manipulate strings always return new strings s = "Hello World" a = s.upper() # a = "HELLO WORLD" b = s.replace("Hello","Hallo") # b = 'Hallo World' • Copies are always made by reference a = "Hello World" b = a c = b "Hello World" a b c
  305. Copyright (C) 2008, http://www.dabeaz.com 2- Immutable Strings 117 • The

    fact that strings are immutable allows operations to optimized inside the interpreter • For example : Use of small strings to refer to named fields, etc. • Gives more freedom in how strings are represented/manipulated internally (since programs aren't allowed to touch the bits)
  306. Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 118 • That might

    have been more about numbers and strings than you ever wanted to know • In my experience, working programmers only have a flimsy grasp of the details (especially when it concerns Unicode). • One goal of going into detail has been to inform you of the important issues and pitfalls. • Note : There is not going to be a unicode quiz
  307. Copyright (C) 2008, http://www.dabeaz.com 2- Data Structures 120 • In

    programs, it is often necessary to represent data that consists of multiple parts • Example: A holding of stock Name : "GOOG" (string) Shares : 100 (integer) Price : 490.10 (float) 100 shares of GOOG at 490.10 • There are three basic components
  308. Copyright (C) 2008, http://www.dabeaz.com 2- Structs in Static Languages 121

    • In static programming languages (C, Java, etc.), data structures are managed by defining a "structure" or "class" struct StockHolding { char name[8]; int shares; double price; }; • This precisely defines the members, the memory layout, and other low-level details
  309. Copyright (C) 2008, http://www.dabeaz.com 2- Grouping Values 122 • Dynamic

    languages don't really have "structs" • Instead, you can group values together g = "GOOG", 100, 490.10 a = "AAPL", 50, 123.45 • This becomes a single object composed of multiple parts (sometimes known as a tuple) • You can pass it around in your program as a single "value"
  310. Copyright (C) 2008, http://www.dabeaz.com 2- Using Compound Data 123 •

    When values are grouped, the components are typically ordered (like an array) g = "GOOG", 100, 490.10 name = g[0] shares = g[1] price = g[2] • However, you also just unpack values like this: g = "GOOG", 100, 490.10 ... name, shares, price = g
  311. Copyright (C) 2008, http://www.dabeaz.com 2- Packing/Unpacking 124 • This concept

    of packing/unpacking values is surprising rich in most dynamic languages • Most programmers aren't even aware of the full extent to which this actually works • Some examples follows
  312. Copyright (C) 2008, http://www.dabeaz.com 2- Example (Python/Ruby) 125 Date =

    16, "Jan", 2008 Time = 17, 30 When = Date, Time Who = "David Beazley", "[email protected]" Lecture = "Working with Data", Who, When Packing: Lecture = ( "Working with Data", ("David Beazley","[email protected]"), ( (16,"Jan", 2008), (17, 30) ) )
  313. Copyright (C) 2008, http://www.dabeaz.com 2- Example (Python/Ruby) 126 Unpacking: Lecture

    = ( "Working with Data", ("David Beazley","[email protected]"), ( (16,"Jan", 2008), (17, 30) ) ) Just flip the sides and put in some variable names: ( title, (name,email) ( (day,month,year), (hour,minute) ) ) = Lecture
  314. Copyright (C) 2008, http://www.dabeaz.com 2- Example (Perl) 127 @Date =

    (16, "Jan", 2008); @Time = (17, 30); @When = (@Date, @Time); @Who = ("David Beazley", "[email protected]"); @Lecture = ("Working with Data", @Who, @When); Unpacking: ($title, ($name,$email), ( ($day,$month,$year), ($hour, $min) ) ) = @Lecture; Packing:
  315. Copyright (C) 2008, http://www.dabeaz.com 2- Example (PHP) 128 $Date =

    array(16, "Jan", 2008); $Time = array(17, 30); $When = array($Date, $Time); $Who = array("David Beazley", "[email protected]"); $Lecture = array("Working with Data", $Who, $When); Unpacking: list($title, list($name,$email), list( list($day,$month,$year), list($hour,$minute) ) ) = $Lecture; Packing:
  316. Copyright (C) 2008, http://www.dabeaz.com 2- Example (PHP) 129 $Date =

    array(16, "Jan", 2008); $Time = array(17, 30); $When = array($Date, $Time); $Who = array("David Beazley", "[email protected]"); $Lecture = array("Working with Data", $Who, $When); Unpacking (with ignored values) list(, list(,), list( list($day,$month,$year), list(,) ) ) = $Lecture; Packing:
  317. Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 130 • Working with

    data using packed values tends to be quite efficient • Fairly small memory footprint • Implementation is highly optimized in the interpreter (dynamic languages often rely on these same data structures for their own operation)
  318. Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 131 • Packing and

    unpacking values only really works well if data consists of a small number of parts • It would be extremely annoying to do this with a 50-field database row • There may be constraints on packed values. For example, in Python, such objects are immutable.
  319. Copyright (C) 2008, http://www.dabeaz.com 2- Using Named Fields 132 •

    An alternative approach is to store data using objects involving named fields • Dictionaries, hashes, associative arrays, etc. # Python g = { 'name' : GOOG, 'shares' : 100, 'price' : 490.10 } # Ruby g = { 'name' => 'GOOG', 'shares' => 100, 'price' => 490.10 } # PHP $g = array('name' => 'GOOG', 'shares' => 100, 'price' => 490.10) # Perl %g = { name => 'GOOG', shares => 100, price => 490.10 }; # Javascript g = { name : 'GOOG', shares : 100, price : 490.10 };
  320. Copyright (C) 2008, http://www.dabeaz.com 2- Dictionaries/Hashes 133 • The behavior

    is almost identical across languages • Work like arrays but you use the field names shares = g['shares']; # Retrieval g['shares'] = 75; # Assignment • Unlike a normal array, there is no ordering. • Keys aren't stored in alphabetical order, etc.
  321. Copyright (C) 2008, http://www.dabeaz.com 2- Collections of Objects 135 •

    Programs often have to work with collections of "objects" • Example : A collection of stocks in a portfolio YHOO 50 19.25 AAPL 100 143.41 SCOX 500 4.21 GOOG 20 490.10 MSFT 50 67.12 JAVA 75 6.23 IBM 50 91.10
  322. Copyright (C) 2008, http://www.dabeaz.com 2- Choices 136 • In most

    dynamic languages, there are two very common choices for collections • List or array (ordered sequence of items) • Associative array/hash table (unordered data) • We've already used these in the last section.
  323. Copyright (C) 2008, http://www.dabeaz.com 2- Lists/Arrays 137 • An ordered

    sequence of values items = [1, 3.5, "Hello"] • Items are accessed by numerical indices n = len(items) # Number of items a = items[i] # Retrieve the ith item items[i] = b # Change the ith item • There are often append/insert/delete operations items.append(x) items.remove(y) items.insert(i,z) • Read the manual to know exact syntax
  324. Copyright (C) 2008, http://www.dabeaz.com 2- Hashes/Dictionaries 138 • An unordered

    collection of values prices = { 'GOOG' : 523.10, 'AAPL' : 172.23, 'IBM' : 105.44 } • Values are accessed by keys n = len(prices) # Number of items a = prices['GOOG'] # Retrieve 'GOOG' value prices['SCOX'] = 0 # Change the 'SCOX' value • Likewise, there are various operations for manipulating the contents
  325. Copyright (C) 2008, http://www.dabeaz.com 2- Array/Dictionary Data 139 • A

    critical part of using containers is knowing that you can store any kind of data that you want inside • This includes other lists and hashes YHOO 50 19.25 AAPL 100 143.41 SCOX 500 4.21 GOOG 20 490.10 MSFT 50 67.12 JAVA 75 6.23 IBM 50 91.10 [ ['YHOO', 50, 19.25], ['AAPL', 100, 143.41], ['SCOX', 500, 4.21], ['GOOG', 20, 490.10], ['MSFT', 50, 67.12], ['JAVA', 75, 6.23], ['IBM', 50, 91.10] ] list of lists
  326. Copyright (C) 2008, http://www.dabeaz.com 2- Another Example 140 YHOO 50

    19.25 AAPL 100 143.41 SCOX 500 4.21 GOOG 20 490.10 MSFT 50 67.12 JAVA 75 6.23 IBM 50 91.10 [ { 'name' :'YHOO', 'shares' : 50, 'price' : 19.25 }, { 'name' : 'AAPL', 'shares' : 100, 'price' : 143.41 } ... ] list of dicts
  327. Copyright (C) 2008, http://www.dabeaz.com 2- "First Class" Data 141 •

    Sometimes you will computer scientists talking about so-called "First Class" objects. • This means that whatever they're talking about can be used as data value in a program. • You can assign it to a variable • You can store it in an array. • It has equal status with primitive types • In most dynamic languages, everything is FC
  328. Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 142 • An adept

    programmer can write significant programs that do nothing but perform operations on lists and hashes • These data structures are powerful enough to do almost any kind of data processing you would ever need to do. • Note : You almost never hear of people implementing things like linked lists and search trees in these languages (why bother?)
  329. Copyright (C) 2008, http://www.dabeaz.com 2- Array Implementation 143 • For

    ordered data, an array is usually just a resizable array of references to values items = [1, 3.5, "Hello"] 1 3.5 "Hello" • It's an array of pointers to the values items
  330. Copyright (C) 2008, http://www.dabeaz.com 2- Dictionary Implementation 144 • For

    dictionaries/hashes, you get a mapping of keys to values 523.10 172.23 105.44 prices 'GOOG' 'IBM' 'AAPL' • The tricky part : Searching for keys prices = { 'GOOG' : 523.10, 'AAPL' : 172.23, 'IBM' : 105.44 }
  331. Copyright (C) 2008, http://www.dabeaz.com 2- Implementing Dictionaries 145 • The

    critical part of creating a dictionary is knowing what to do with the keys • Can a key be any object or is it restricted to strings? • How do you perform a fast key-lookup?
  332. Copyright (C) 2008, http://www.dabeaz.com 2- Hashing 146 • Any object

    that can be used as a key, is given a hash value operation. • This usually computes an integer value irb(main):023:0> a = "GOOG" => "Hello" irb(main):024:0> a.hash => 252612492 irb(main):025:0> Hash value • You use the hash value to perform a lookup
  333. Copyright (C) 2008, http://www.dabeaz.com 2- Hashing 147 • A hash

    table has some number of "slots" (N) "GOOG" 252612492 ... N-1 0 j % N hashval "GOOG" 523.10 • Ideally, each key has a different hashing slot
  334. Copyright (C) 2008, http://www.dabeaz.com 2- Hashing 148 • For collisions,

    you typically chain ... N-1 0 "name1" Value1 • However, the design of this can be complex (consult an algorithms book) "name2" Value2 "name3" Value3
  335. Copyright (C) 2008, http://www.dabeaz.com 2- Comments on Hashing 149 •

    Hash tables are one of the most essential data structures in virtually every dynamic language • Used not only by end-users, but for the implementation of the interpreter itself • Reading : A. Kuchling, "Python's Dictionary Implementation : Being All Things To All People", in Beautiful Code (O'Reilly)
  336. Copyright (C) 2008, http://www.dabeaz.com 2- Variable Assignment total = 0.0

    x = 42 151 • In dynamic languages, variables are just names for values • As the program runs, the value may change. • And it may change to a completely different type of data x = "foo" • Er..... didn't we already have this slide????
  337. Copyright (C) 2008, http://www.dabeaz.com 2- What is Assignment? x =

    42 152 • What does this do? • It assigns a value to a variable, yes. • But what does this do? y = x • And this...? z = [x,y] # A list/array with two items
  338. Copyright (C) 2008, http://www.dabeaz.com 2- Value Binding x = 42

    # Binding a value to a name y = x # Binding a value to a name items[2] = x # Binding to container location 153 • When programs run, values (i.e., data) get bound to different locations • But, what really happens? • In the above code, the value 42 has been assigned to three different places. • Does that mean there are three copies of 42 in memory? (Answer : It depends)
  339. Copyright (C) 2008, http://www.dabeaz.com 2- Assignment by Value x =

    42 y = x items[2] = x 154 • Assignments always make a local copy of whatever value is being stored. x 42 y items 42 42 2 These are all distinct objects even though they have the same value
  340. Copyright (C) 2008, http://www.dabeaz.com 2- Assignment by Value x =

    "... A string with 10 million characters ..." y = x items[2] = x 155 • But, consider this case: • Discuss amongst yourselves... • Maybe this one isn't so clear-cut. • Might depend on how strings were implemented.
  341. Copyright (C) 2008, http://www.dabeaz.com 2- Reference Variables # Perl $x

    = 42; $y = \$x; # Reference to $x print $$y,"\n"; # Dereference the value $$y = 37; # Reassign value being reference 156 • You might introduce special reference/ pointer variables (Perl/PHP) • This lets you refer to data instead of making copies, but it also introduces pointers • That may or may not be a good thing
  342. Copyright (C) 2008, http://www.dabeaz.com 2- Assignment by Reference x =

    42 y = x items[2] = x 157 • All assignments merely makes a reference to the value (like a pointer) x 42 y items 2 There is one object with value 42, many locations point to it.
  343. Copyright (C) 2008, http://www.dabeaz.com 2- Assignment by Reference 158 •

    If everything is a reference, there are other issues. x 42 y items 2 • Are primitive types mutable? x = 37 • In general, you want immutable data to avoid making your head explode.
  344. Copyright (C) 2008, http://www.dabeaz.com 2- Reference Counting 159 • How

    do you track memory? x 42 y items 2 • Must keep reference counts on values or perform some kind of garbage collection ref=3 • Values (memory) will be reclaimed when no more references
  345. Copyright (C) 2008, http://www.dabeaz.com 2- Commentary 160 • Most modern

    dynamic languages assign by reference (Python, Ruby, Javascript, etc.) • This is one of the reasons why strings are immutable in Python/Javascript • You can check it out: >>> a = 42 >>> b = a >>> a is b True >>>
  346. Copyright (C) 2008, http://www.dabeaz.com 2- Some Perils 161 • Working

    with containers (lists/hashes) is very tricky in this model irb(main):001:0> a = [1,2,3,4] => [1, 2, 3, 4] irb(main):002:0> b = a => [1, 2, 3, 4] irb(main):003:0> b[2] = 99 => 99 irb(main):004:0> a => [1, 2, 99, 4] irb(main):005:0> • Since assignments only make references, you get shared references to the same object a b 1 2 99 4
  347. Copyright (C) 2008, http://www.dabeaz.com 2- Copying 162 • To make

    copies of containers, you have to take special steps irb(main):001:0> a = [1,2,3,4] => [1, 2, 3, 4] irb(main):002:0> b = a.clone => [1, 2, 3, 4] irb(main):003:0> b[2] = 99 => 99 irb(main):004:0> a => [1, 2, 3, 4] irb(main):005:0> b => [1, 2, 99, 4] Make a "copy" of the object
  348. Copyright (C) 2008, http://www.dabeaz.com 2- Shallow Copies 163 • Copies

    are often only shallow • You get a new object, but the values inside are copied by reference a b a.clone Values 1 2 3 4 99 b[2] = 99
  349. Copyright (C) 2008, http://www.dabeaz.com 2- More Peril 164 • Containers

    of containers (python) >>> a = [2,3,[100,101],4] >>> b = list(a) >>> a is b False • However, items in list copied by reference >>> a[2].append(102) >>> b[2] [100,101,102] >>> 100 101 102 2 3 4 a b This list is being shared
  350. Copyright (C) 2008, http://www.dabeaz.com 2- Deep Copies 165 • To

    actually copy data, you might have to execute a "deep copy" operation >>> a = [2,3,[100,101],4] >>> import copy >>> b = copy.deepcopy(a) >>> a[2].append(102) >>> b[2] [100,101] >>> • Recursively traverses through the object and copies everything that can be found. • (This is also an interesting CS problem)
  351. Copyright (C) 2008, http://www.dabeaz.com 2- Hidden Secrets 166 • Dynamic

    languages use various facets of references, immutable data, and this memory model to perform various kinds of optimization. • Example : Small integer caching. Small integers are frequently cached and reused. >>> a = 42 >>> b = 37 >>> c = b + 5 >>> c is a True >>> a 42 b 37 c
  352. Copyright (C) 2008, http://www.dabeaz.com 2- Hidden Secrets 167 • Example:

    Sharing dictionary keys and variable names stock = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10 } person = { 'name' : 'Dave', 'email' : '[email protected]' } 'name' name = "Mondo" • Programs may use much less memory than you think
  353. Copyright (C) 2008, http://www.dabeaz.com 2- Wrap-up 169 • A lot

    of material has been presented in this section, but there are three big take-aways • Data. A close look at primitive datatypes (numbers, reals, and strings) • Data structures. How to group data together (lists, arrays, etc.) • Assignment. What happens when you assign variables and manipulate values in a program.
  354. Copyright (C) 2008, http://www.dabeaz.com 2- More Information 170 • Everything

    else related to manipulating basic data is user-manual sorts of stuff • E.g., can look up how to append to a list in your favorite language. • Will get a chance to explore in the exercise
  355. Copyright (C) 2008, http://www.dabeaz.com 3- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  356. Copyright (C) 2008, http://www.dabeaz.com 3- Introduction 3 • This section

    explores the problem of structuring more complicated programs • Program structure and statements • Control flow structures • Functions • Exception handling
  357. Copyright (C) 2008, http://www.dabeaz.com 3- What is a Program? 5

    • A program is a series of statements • The statements perform various operations and generate some kind of result • When a program runs, it executes statements until there is nothing more to do • It seems pretty straightforward---although there are some thorny theoretical questions (e.g., "The Halting Problem").
  358. Copyright (C) 2008, http://www.dabeaz.com 3- An Example Program n =

    10 while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 6 • Count down from 10 • Yippee! T-minus 10 T-minus 9 T-minus 8 T-minus 7 ... T-minus 4 T-minus 3 T-minus 2 T-minus 1 Fizzle...
  359. Copyright (C) 2008, http://www.dabeaz.com 3- Statements 7 • Common types

    of statements • Assignment (=) • Conditions (if-else) • Looping (while, for, break, continue) • Functions (definitions and calls) • Error handling
  360. Copyright (C) 2008, http://www.dabeaz.com 3- Execution Environment n = 10

    while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 8 • Statements never run in isolation! • They always run inside an "environment" • This is where the variables live environment statements 'n' : 10 variables
  361. Copyright (C) 2008, http://www.dabeaz.com 3- Execution Environment n = 10

    while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 9 • As a program runs, the statements tend to do one of several things • They either modify the environment environment statements 'n' : 10 9 variables
  362. Copyright (C) 2008, http://www.dabeaz.com 3- Execution Environment n = 10

    while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 10 • Or they control the next statement that executes (control flow) environment statements 'n' : 9 variables
  363. Copyright (C) 2008, http://www.dabeaz.com 3- Execution Environment n = 10

    while n > 0: print "T-minus", n n = n - 1 print "Fizzle..." 11 • Or they perform some kind of input/output with the outside world environment statements 'n' : 9 variables
  364. Copyright (C) 2008, http://www.dabeaz.com 3- Commentary 12 • This is

    a fairly simple view, but most programs really don't do much more than what I've described • Of course, the devil is in the details • We're going to look at various facets of program structure and execution
  365. Copyright (C) 2008, http://www.dabeaz.com 3- Assignment of Values 14 •

    Assignment stores a value x = 42 avg = (x+y)/2 items[2] = 37 a["name"] = "Elvis" a.response = "Yeah" • General form of assignment location = expression • An expression represents a value • Location specifies the place where stored
  366. Copyright (C) 2008, http://www.dabeaz.com 3- Expressions 15 • An expression

    always represents a value • May involve various operations on data 42 # Literal x # Variable x + y # Math operator x[i+n] # Array lookup foo(x) # Function call (x+y) / (a+b) # Grouping • The syntax and set of operators is fairly standard across languages (minor variations)
  367. Copyright (C) 2008, http://www.dabeaz.com 3- Expressions 16 • Critical point

    : Expressions always get evaluated in the environment x = a * b environment statements 'a' : 3 variables • Unknown names result in an error Error! ??
  368. Copyright (C) 2008, http://www.dabeaz.com 3- Locations 17 • A location

    represents a place where a value is going to be "stored" (known as an "lvalue") • It might be a name x = 42 name = "Elvis" • But it also might also involve an expression names[i+n] = Elvis • Key point : The left hand side must always represent a place where you can put a value
  369. Copyright (C) 2008, http://www.dabeaz.com 3- Locations 18 • Locations always

    refer to a place in the surrounding environment • Storing to an unknown name creates an entry • Storing to an existing place replaces the value a = 3 b = 4 x = [a, b] x[1] = 37 environment statements 'a' : 3 'b' : 4 'x' : [3, 4] variables 37
  370. Copyright (C) 2008, http://www.dabeaz.com 3- Storing Values 19 • What

    does it mean to "store" a value? • From last lecture, we saw that this can be more complicated that you might imagine • Might be by value or by reference
  371. Copyright (C) 2008, http://www.dabeaz.com 3- Assignment by Value 20 •

    Assign by value (make a copy) a = 42 b = a environment statements variables 'a' 'b' 42 42 • Here, there are two separate objects with the value 42
  372. Copyright (C) 2008, http://www.dabeaz.com 3- Assignment by Reference 21 •

    Assign by reference (copy pointers) a = 42 b = a environment statements variables • Here, there is one object with the value 42, but two names refer to it 42 'a' 'b'
  373. Copyright (C) 2008, http://www.dabeaz.com 3- Mutability 22 • Does assignment

    overwrite previous values by overwriting the memory? • Or does assignment make a new object? a = 42 ... a = 37 'a' 42 42 'a' 37 Overwrites 37 ref-- Rebind to new object • Answer : It depends on the language
  374. Copyright (C) 2008, http://www.dabeaz.com 3- Example : Python 23 •

    Overwriting a variable a = 42 b = a a = 37 environment statements variables 42 'a' 'b' 37 old new • You get a new value and the name is rebound • The old value may persist if used elsewhere
  375. Copyright (C) 2008, http://www.dabeaz.com 3- Commentary 24 • The key

    point : Assignment is an operation that modifies the environment in which statements execute • Deep thought : The environment sure looks a lot like a hash table/dictionary/associative array
  376. Copyright (C) 2008, http://www.dabeaz.com 3- Conditional Execution 26 • Execute

    statements on condition # Python if x > 0: statements else: statements • Almost all languages do exactly what you would expect here • Condition is checked and only one branch runs # Ruby if x > 0 statements else statements end # Perl/PHP if ($x > 0) { statements } else { statements }
  377. Copyright (C) 2008, http://www.dabeaz.com 3- Conditional Expressions 27 • Conditions

    generally rely on the result of a conditional expression if condition statements • Usually, this is built from special operators expr == expr expr != expr expr < expr expr > expr expr <= expr expr >= expr condition and condition condition or condition not condition • Produces a true/false value
  378. Copyright (C) 2008, http://www.dabeaz.com 3- Examples 28 • Ruby irb(main):020:0>

    x < 3 => true irb(main):021:0> x < 3 and x > 42 => false irb(main):022:0> • Python >>> x < 3 True >>> x < 3 and x > 42 False >>>
  379. Copyright (C) 2008, http://www.dabeaz.com 3- What is Truth? 29 •

    A tricky issue : What happens here? if x statements • x is just some value (we don't know what) • For this to make sense, you have to know what it means for a value to be "True" • Believe it not, there are some differing ideas about that.
  380. Copyright (C) 2008, http://www.dabeaz.com 3- What is Truth? 30 •

    Option 1: A value is true if it is non-zero, non- empty, or generally looks like it has an interesting value • This is probably the most common treatment # True Values x = 1 x = "Hello" x = [1,2,3] # False Values x = 0 x = "" x = [] x = None if x statements
  381. Copyright (C) 2008, http://www.dabeaz.com 3- What is Truth? 31 •

    Option 2: A value is true if x is not false or is assigned to an actual value. • This is the approach Ruby takes if x statements # True Values x = 0 x = "Hello" x = [1,2,3] x = "" # False values x = nil x = false • Danger : 0 evaluates as True! • This is really a pointer check--does x point to a value?
  382. Copyright (C) 2008, http://www.dabeaz.com 3- What is Truth? 32 •

    Sometimes it gets messy. • In Perl, non-empty strings evaluate as True # Perl $x = "0"; if ($x) { print "Yes" } else { print "No" } "No" $x = "Hello"; # True $x = ""; # False • Well, it's true most of the time
  383. Copyright (C) 2007, http://www.dabeaz.com 3- Evaluation of Relations • Relations

    only evaluate parts until the result can be determined if condition1 and condition2 statements if condition1 or condition2: statements 33 • Example: not evaluated if condition1 is False not evaluated if condition1 is True if x != 0 and y/x < 0.01 statements • Also known as "short-circuit" evaluation
  384. Copyright (C) 2007, http://www.dabeaz.com 3- Statement Conditions • Statement with

    a condition 34 statement if condition; statement unless condition; • Examples: print "Hello Dave\n" unless ($name ne "Dave"); # Perl print "Hello Dave\n" unless $name != "Dave" # Ruby • This form is somewhat less common. • Personally, I'm not a huge fan... Here is a really long statement that looks like it will erase the entire filesystem ... NOT!
  385. Copyright (C) 2007, http://www.dabeaz.com 3- Switch Statements • Evaluation of

    code based on the value of a variable: 35 switch(variable) { case value1: statements case value2: statements case value3: statements default: statements } • This is not always supported and even if it is, there are subtle issues
  386. Copyright (C) 2007, http://www.dabeaz.com 3- Switch Statements • Do statement

    blocks "fall-through?" 36 switch(variable) { case value1: statements break case value2: statements case value3: statements break default: statements } statements If no break, execution falls through to the next case • This is the behavior of C/Java/Javascript, etc. • Fall-through may be disallowed (Ruby)
  387. Copyright (C) 2007, http://www.dabeaz.com 3- Chained if-elif-else • Python doesn't

    provide any kind of switch. You just chain if-elif-else statements 37 if condition: statements elif condition: statements elif condition: statements else: statements • Thinking : Having a separate switch statement just seems redundant if content == 'gif': ... elif content == 'png': ... elif content == 'jpg': ... else print "Unknown content!" Example
  388. Copyright (C) 2007, http://www.dabeaz.com 3- Switch Implementation • The switch

    statement might be far-more efficient than chained if-else depending on how it is implemented • Historically, compilers would turn switch into a jump table (a goto lookup table) 38 switch(variable) { case value1: statements case value2: statements case value3: statements default: statements } value1 value3 value2 loc1 loc3 loc2 loc1: statements ... loc2: statements ... loc3: statements
  389. Copyright (C) 2007, http://www.dabeaz.com 3- Switch Experiment • The implementation

    of switch in many dynamic languages seems to be hit or miss • I actually did a little experiment ("The Big Switch") 39 switch(variable) { case 1: statement case 2: statement ... case 999: statement } if (variable == 1) { statement } else if (variable == 2) { statement } else if { ... } else if (variable == 999) { statement } vs. variable == 999
  390. Copyright (C) 2007, http://www.dabeaz.com 3- Switch Experiment • PHP 40

    switch : 58.9 seconds else if : 108.1 seconds • Ruby switch :151.5 seconds else if : 106.5 seconds • Javascript (in Firefox) switch :~7 seconds else if : ???? minutes (didn't have patience to wait) • A million repetitions
  391. Copyright (C) 2007, http://www.dabeaz.com 3- Looping on a Condition •

    while loops (universally supported) 42 while condition statement statement statement end • Only the syntax differs slightly • Sometimes you will find this variation do statement statement statement while condition
  392. Copyright (C) 2007, http://www.dabeaz.com 3- Loop Exit • To prematurely

    break out of a loop 43 while condition statement break # Terminates a loop statement end • Example: Python while True: line = f.readline() if line == 'END' : break # Various processing ...
  393. Copyright (C) 2007, http://www.dabeaz.com 3- Loop Continuation • To skip

    the rest of the statements and go back to the start of the loop 44 while condition statement continue # Go back to the top statement # Not executed end • Example: Python while True: line = f.readline() if line.startswith("#"): continue # Do more processing ...
  394. Copyright (C) 2007, http://www.dabeaz.com 3- For-loop (classic) • Looping with

    some kind of looping variable 45 for (init; condition; increment) { statements } • Example: for (i = 0; i < 10; i++) { print i } • This is really just a short-hand for this i = 0; while (i < 10) { print i; i++; }
  395. Copyright (C) 2007, http://www.dabeaz.com 3- For-loop (Iteration) • The more

    modern use of for is to loop over items of a collection (array, hash, etc.) 46 for item in collection statements end • Example: items = [1, 4, "Foo", "Bar"] # Ruby for x in items # x = 1, 4, "Foo", "Bar" ... end • Might be known as a "foreach" statement
  396. Copyright (C) 2007, http://www.dabeaz.com 3- Iteration • Looping over a

    collection is a very powerful concept • A collection could be many different things • An array, hash, set, string, file, etc. 47 f = open("foo.txt") for line in f: statements ...
  397. Copyright (C) 2007, http://www.dabeaz.com 3- Iteration Example • Looping over

    a collection of stocks 48 portfolio = [ ('GOOG',100, 490.10), ('IBM', 50, 91.10), ('AAPL', 75, 122.45), ('YHOO', 45, 28.42) ] for name, shares, cost in portfolio: # statements # ... • Notice how values get expanded into variables for you (very nice)
  398. Copyright (C) 2007, http://www.dabeaz.com 3- Iteration Commentary • The whole

    concept of "iterating" over data is something that has been expanded greatly in dynamic languages • For instance, a large number of recent features in Python are just related to this • We'll see a lot more of this later 49
  399. Copyright (C) 2007, http://www.dabeaz.com 3- Looping-Else • The looping-else (Python)

    50 for x in s: statements else: statements • The else clause only runs if the loop runs all the way to completion without breaking for line in open("stocks.dat"): if 'IBM' in line: break else: print "Didn't find it"
  400. Copyright (C) 2007, http://www.dabeaz.com 3- Looping-Redo/Retry • The redo statement

    (Ruby) 51 for x in s statements redo statements end • Restarts the body of the loop without updating the iteration variable • Retry : Restarts from the beginning for x in s statements retry statements end
  401. Copyright (C) 2007, http://www.dabeaz.com 3- Interlude • A significant number

    of things can be done using nothing but basic statements, conditions, and loops • For example: Writing scripts, data processing, etc. • I would suspect that a large number of programs actually use nothing more than these features to do various odd-jobs 52
  402. Copyright (C) 2007, http://www.dabeaz.com 3- Interlude • To write larger

    programs, you want to do more than this • Packaging code so that you can reuse it • Bundling code into modules and libraries 53
  403. Copyright (C) 2007, http://www.dabeaz.com 3- What is a Function? •

    Mathematically, it's an operation that accepts a bunch of inputs (arguments) and produces an output (the result) • Examples • sin(x) • f(x,y) -> 3x2 + 2xy - 7 • However, this isn't a math class nor is it a theoretical programming languages course 55
  404. Copyright (C) 2007, http://www.dabeaz.com 3- What is a function? •

    A function is a named sequence of statements def funcname statement statement ... statement end 56 • If you want those statements to run, you just invoke the function name funcname
  405. Copyright (C) 2008, http://www.dabeaz.com 3- Example (Ruby) 57 def countdown

    n = 10 while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" end countdown # Run the above function
  406. Copyright (C) 2007, http://www.dabeaz.com 3- Function Definition • Defining a

    function is actually an assignment in the environment 58 statements variables 'countdown' def countdown n = 10 while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" end statement statement statement ... • The "value" of a function is the list of statements inside the body of the function
  407. Copyright (C) 2007, http://www.dabeaz.com 3- Function Definition • Functions are

    like data in dynamic languages • In fact, they can be redefined on-the-fly just like variables 59 • You can even redefine a function in the middle of running your program (try that in C++)
  408. Copyright (C) 2008, http://www.dabeaz.com 3- Example (Ruby) 60 def countdown

    n = 10 while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" end countdown # Run the above function def countdown print "Boom!\n" end countdown # Run the new function
  409. Copyright (C) 2007, http://www.dabeaz.com 3- def countdown n = 10

    while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" end Function Execution • What happens when you call a function? statement statement countdown statement statement 61 • Control passes to the first function statement • After the function is done, you go back to the statement after the function call
  410. Copyright (C) 2007, http://www.dabeaz.com 3- Function Execution • Each function

    call creates a new environment 62 statements variables 'countdown' statement statement statement ... statement statement countdown statement statement statements variables 'n' : 10 n = 10 while n > 0 printf("T-minus %d\n", n) n -= 1 end print "Fizzle...\n" countdown
  411. Copyright (C) 2007, http://www.dabeaz.com 3- Function Execution • Because every

    function call creates a new environment, everything that happens inside a function stays localized • A function can freely create new variables and modify its own environment • These changes don't affect anything else • The environment is destroyed when the function returns 63
  412. Copyright (C) 2007, http://www.dabeaz.com 3- Problem • If a function

    executes in its own private environment, how do you get data in and out of the environment? • Passing parameters to a function • Returning results from a function 64
  413. Copyright (C) 2007, http://www.dabeaz.com 3- Function Arguments • To pass

    data into a function, use arguments def square(x) return x*x end 65 • However, an argument doesn't receive a value until the function is actually called a = square(3) argument • Arguments represent incoming values that will be bound to names when the function runs
  414. Copyright (C) 2007, http://www.dabeaz.com 3- Function Arguments 66 statements variables

    'x' : 3 return x*x square • Arguments get placed into the environment created for the function call def square(x) return x*x end square(3)
  415. Copyright (C) 2007, http://www.dabeaz.com 3- Function Returns • To return

    data from a function, use return def square(x) return x*x end 67 Return value • It is up to the caller to save the result (using assignment) statements variables 'r' : 9 r = square(3)
  416. Copyright (C) 2007, http://www.dabeaz.com 3- Commentary • On the surface,

    passing and returning values seems like it should be straightforward • However, there are a number of subtle issues that come up • Where do arguments get evaluated? • How do arguments get passed? 68
  417. Copyright (C) 2007, http://www.dabeaz.com 3- Argument Evaluation 69 statements variables

    'a' : 3 'b' : 4 'square' : <func> a = 3 b = 4 square(a+b) • Consider the following Must evaluate this expression • When calling a function, the arguments are usually fully evaluated first. • Known as "Applicative Evaluation Order"
  418. Copyright (C) 2007, http://www.dabeaz.com 3- Argument Passing • Question :

    How do the values get passed into a function? 70 statements variables 'x' 'y' func(x,y) value1 value2 def func(a,b) statements variables 'a' 'b' statement statement statement ? ? ? ?
  419. Copyright (C) 2007, http://www.dabeaz.com 3- Pass by Value • Function

    arguments are copies 71 statements variables 'x' 'y' func(x,y) value1 value2 def func(a,b) statements variables 'a' 'b' statement statement statement value1 value2 copy
  420. Copyright (C) 2007, http://www.dabeaz.com 3- Pass by Reference • Function

    arguments are references 72 statements variables 'x' 'y' func(x,y) value1 value2 def func(a,b) statements variables 'a' 'b' statement statement statement
  421. Copyright (C) 2007, http://www.dabeaz.com 3- Discussion • Pass by reference

    is often preferred because it is the most efficient way to pass containers (lists and hashes) to functions • For example, if you have a list with a million entries in it, you don't want to make a copy • However, be aware that modifications to argument will affect the caller. 73
  422. Copyright (C) 2007, http://www.dabeaz.com 3- Reference Example • Modifications to

    mutable data types (e.g., lists, dicts) will be reflected in the original object--arguments are not copies. 74 def insert_sorted(s,val): for i,x in enumerate(s): if x > val: s.insert(i,val) break else: s.append(val) a = [10, 15, 50] insert_sorted(a,27) # a = [10, 15, 27, 50] Modifies the passed object
  423. Copyright (C) 2008, http://www.dabeaz.com 3- More on the Environment 75

    • Recall : All statements execute in an environment that holds variables • A thorny question : Are statements able to access variables that have been defined in other environments? • For example, can a function access variables that were defined outside of the function? (e.g., globals)
  424. Copyright (C) 2008, http://www.dabeaz.com 3- An Example 76 x =

    42 def foo y = 2*x x = 37 bar end def bar print x print y end foo
  425. Copyright (C) 2008, http://www.dabeaz.com 3- An Example 77 x =

    42 def foo y = 2*x x = 37 bar end def bar print x print y end foo Question 1: Does this "x" refer to the global value?
  426. Copyright (C) 2008, http://www.dabeaz.com 3- An Example 78 x =

    42 def foo y = 2*x x = 37 bar end def bar print x print y end foo Question 2: Does this assignment modify the global value? 37
  427. Copyright (C) 2008, http://www.dabeaz.com 3- An Example 79 x =

    42 def foo y = 2*x x = 37 bar end def bar print x print y end foo Question 3: Does this "y" refer to the "y" in foo (the caller)?
  428. Copyright (C) 2008, http://www.dabeaz.com 3- Lexical Scope 80 • Most

    programming languages deal with these questions using two-level "lexical scoping" • General idea : All variables either live in a "local" space or they live in a "global" space as determined by the structure of the source code. • Globals are the variables defined outside of function bodies • Locals are the variables defined inside functions
  429. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 81 • When

    a program starts, there is an empty global environment x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo start statements variables
  430. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 82 • Statements

    start populating the environment x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo statements variables 'x' : 42
  431. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 83 • Statements

    start populating the environment x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo statements variables 'x' : 42 'foo' : <func>
  432. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 84 • Statements

    start populating the environment x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo statements variables 'x' : 42 'foo' : <func> 'bar' : <func>
  433. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 85 • Now

    consider a function call: x = 42 def foo y = 2*x x = 37 bar end def bar print x print y end foo statements variables 'x' : 42 'foo' : <func> 'bar' : <func> Call a function
  434. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 86 • Function

    calls create a new environment globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables foo : environment • Globals is the variable table from the global environment (previous slide)
  435. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 87 • Two-level

    variable lookup globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables foo : environment x? x? • When looking up a value, look in the variable table of the local environment first • If not found, look in globals (as a fallback)
  436. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 88 • Variable

    assignment globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 foo : environment • Assignment puts a new value in the locals
  437. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 89 • Variable

    assignment globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 foo : environment • But, what happens here? • Notice : There is nothing in the assignment statement that indicates where it goes x? x?
  438. Copyright (C) 2008, http://www.dabeaz.com 3- Discussion 90 • Should assignment

    overwrite previous values? globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 foo : environment • Is that a good idea or not? 37 ?
  439. Copyright (C) 2008, http://www.dabeaz.com 3- Assignment (Revisited) 91 • Consider

    this code fragment def foo() x = 42 y = 37 ... end • If reading this code, most programmers will interpret those variables as locals • It would be pretty damn weird if the behavior changed depending on whether or not someone defined a global with those names
  440. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 92 • Variable

    assignment should be local globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 'x' : 37 foo : environment • Python and Ruby operate like • However, it's not so clear cut • Let's continue for now...
  441. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 93 • New

    function calls create more environments globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 'x' : 37 foo : environment print x print y statements variables bar : environment
  442. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 94 • Variable

    lookup (reprise) globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 'x' : 37 foo : environment print x print y statements variables bar : environment x? x? Variable lookup always looks in just two places
  443. Copyright (C) 2008, http://www.dabeaz.com 3- Two-level Scoping 95 • Variable

    lookup (reprise) globals 'x' : 42 'foo' : <func> 'bar' : <func> y = 2*x x = 37 bar statements variables 'y' : 84 'x' : 37 foo : environment print x print y statements variables bar : environment y? y? A name error. "y" is not defined.
  444. Copyright (C) 2008, http://www.dabeaz.com 3- Assignment Problems 96 • Two-level

    scoping is an effective way of managing variables, but it has a problem • There is always this distinction between the local space and global space • Sometimes you want to assign values in either one of those spaces • Sometimes you actually do want to change a global variable.
  445. Copyright (C) 2008, http://www.dabeaz.com 3- Assignment Problems 97 • The

    management of the local/global variable space is one area where dynamic languages can get messy • Extreme peril in certain cases
  446. Copyright (C) 2008, http://www.dabeaz.com 3- Global Peril 98 • A

    Little Perl Experiment $x = 19; sub foo() { $x = 42; $y = 37; } foo(); print "$x\n"; # Produces 42 print "$y\n"; # Produces 37 • All variable assignments are global--including those inside the subroutine! • Yikes! This violates everything I covered!
  447. Copyright (C) 2008, http://www.dabeaz.com 3- Global Peril (Part 2) 99

    • The same experiment in Javascript x = 19; function foo() { x = 42; y = 37; } foo(); document.writeln(x); // Produces 42 document.writeln(y); // Produces 37 • Okay, I'm just a little disturbed (and you should be too)
  448. Copyright (C) 2008, http://www.dabeaz.com 3- Digression 100 • The problem

    here is that the syntax for assignment doesn't say anything about where a value is supposed to be stored x = 37 • No clear way to indicate that it's a local or a global variable • So, a language will do whatever its designers felt like it should do.
  449. Copyright (C) 2008, http://www.dabeaz.com 3- Declarations 101 • In static

    languages, this problem is solved through the use of "declarations" which precisely pin down the location int x; // Global void foo() { int y; // Local y = 2*x; // No issues here x = 37; } • Fine, but in dynamic languages you don't declare datatypes
  450. Copyright (C) 2008, http://www.dabeaz.com 3- Scope "Tagging" 102 • A

    common approach : Require variables with non-local scope to be tagged x = 19 def foo(): global x # The x below is global x = 42 y = 37 foo() print x # Produces 42 print y # NameError : y not defined • This approach is used by Python, PHP, Tcl
  451. Copyright (C) 2008, http://www.dabeaz.com 3- Variable Declarators 103 • Another

    approach : Allow variables to have optional scope "declarators" • This an approach used by Perl • Except there is more to this (in a minute) $x = 19; sub foo() { local $x = 42; local $y = 37; } foo(); print "$x\n"; # Produces 19 print "$y\n"; # Produces nothing
  452. Copyright (C) 2008, http://www.dabeaz.com 3- Variable Declarators 104 • In

    Javascript, variables live where they are formally declared using "var" • If you leave off the var when defining, the variable is just global var x = 19; // A global function foo() { x = 42; var y = 37; // A local } foo(); document.writeln(x); // 42 document.writeln(y); // Nothing
  453. Copyright (C) 2008, http://www.dabeaz.com 3- Scope Symbols 105 • Ruby

    prepends variables with a special symbol to tell you where it's located • Here, you just look at the variable and you know where it lives Name # A constant name # A local variable $name # A global variable @name # An instance variable (objects) @@name # A class variable (objects) $x = 19; def foo $x = 42 // A global y = 37 // A local end
  454. Copyright (C) 2008, http://www.dabeaz.com 3- Perl Quiz Show 106 •

    Just when you thought were safe, consider this bit of Perl code $x = 42; sub foo() { local $x = 37; bar(); } sub bar() { print "$x\n"; } bar(); # Prints 42 foo(); # Prints 37 (?!?!?!?!?!?!) • This is an example of "Dynamic Scope"
  455. Copyright (C) 2008, http://www.dabeaz.com 3- Perl Dynamic Scope 107 •

    Variables bind to the nearest definition on the function call stack $x = 42; sub foo() { local $x = 37; bar(); } sub bar() { print "$x\n"; } foo(); # Prints 37 globals $x = 42; foo() local $x = 37; bar() print "$x\n"; • This can get absolutely diabolical!
  456. Copyright (C) 2008, http://www.dabeaz.com 3- Perl Lexical Scope 108 •

    Allows a variables to only exist in the block where it was defined $x = 42; sub foo() { my $x = 37; # Local variable bar(); } sub bar() { print "$x\n"; } bar(); # Prints 42 foo(); # Prints 42 • This takes us back to two-level scoping
  457. Copyright (C) 2008, http://www.dabeaz.com 3- Interlude 109 • Clearly there

    is quite a bit more to variable assignment than meets the eye • Especially related to where a value lives • The lack of formal declarations creates various sorts of chaos • Read the manual! • Ignore at your own peril
  458. Copyright (C) 2008, http://www.dabeaz.com 3- The Dangers of Functions 110

    • In dynamic languages, the usage of functions can be very free and easy • Especially compared to C++/Java
  459. Copyright (C) 2007, http://www.dabeaz.com 3- Conformance Checking • There is

    usually no checking or validation of function arguments. • A function will work on any data that is compatible with the statements in the function def add(x,y): return x + y add(3,4) # 7 add("Hello","World") # "HelloWorld" add([1,2],[3,4]) # [1,2,3,4] • Example (Python): 111
  460. Copyright (C) 2007, http://www.dabeaz.com 3- Conformance Checking • There is

    also rarely any checking of return values. • Inconsistent use does not result in an error def foo(x,y): if x: return x + y else: return • Example: 112 Inconsistent use of return (not checked)
  461. Copyright (C) 2007, http://www.dabeaz.com 3- Conformance Checking • If there

    are errors in a function, they will show up at run time (as an exception) def add(x,y): return x+y >>> add(3,"hello") Traceback (most recent call last): ... TypeError: unsupported operand type(s) for +: 'int' and 'str' >>> • Example: 113
  462. Copyright (C) 2007, http://www.dabeaz.com 3- Iteration (Revisited) • The process

    of looping over data is a very common programming operation names = [ "Dave", "Leo", "Nita" ] for name in names: print "Hello", name • Example: 115 • This is something programmers use all of the time without even thinking about it
  463. Copyright (C) 2007, http://www.dabeaz.com 3- Controlling Iteration • Some dynamic

    languages have the ability to turn iteration itself into some kind of "object" that you can manipulate 116 • Example : Generator Functions in Python • Example : Code blocks in Ruby • Disclaimer : This is an advanced topic---I'm just going to cover the basics now
  464. Copyright (C) 2007, http://www.dabeaz.com 3- Generator Functions • A function

    that, instead of returning a single value, stays alive and generates a sequence of results def countdown(n): print "Counting down!" while n > 0: yield n # Yield a value n -= 1 >>> for i in countdown(5): ... print i, ... Counting down! 5 4 3 2 1 >>> 117 • Example:
  465. Copyright (C) 2007, http://www.dabeaz.com 3- Generator Functions • Generator functions

    are pretty odd • If you call one, it doesn't seem to do anything >>> c = countdown(5) >>> 118 • However, it you call .next() on the result, you'll see it start to run >>> c.next() Counting down! 5 >>> c.next() 4 >>>
  466. Copyright (C) 2007, http://www.dabeaz.com 3- Generator Implementation • It's like

    a normal function except that the "environment" is an object with a method that triggers statement execution. 119 print "Counting down!" while n > 0: yield n n -= 1 statements variables 'n' : 5 countdown : environment suspended
  467. Copyright (C) 2007, http://www.dabeaz.com 3- Generator Implementation • Calling .next()

    runs statements until you reach a yield statement. That pops a value out of the function. 120 print "Counting down!" while n > 0: yield n n -= 1 statements variables 'n' : 5 countdown : environment .next() 5
  468. Copyright (C) 2007, http://www.dabeaz.com 3- Generator Implementation • After yielding,

    the function suspends again 121 print "Counting down!" while n > 0: yield n n -= 1 statements variables 'n' : 5 countdown : environment suspended
  469. Copyright (C) 2007, http://www.dabeaz.com 3- Generator Implementation • On the

    following .next(), it wakes up and continues where it left off until the next yield statement is encountered 122 print "Counting down!" while n > 0: yield n n -= 1 statements variables 'n' : 4 countdown : environment .next() 4 • This continues until there are no more statements
  470. Copyright (C) 2007, http://www.dabeaz.com 3- Using Generators • Generators separate

    the concept of iteration from code that uses the iteration 123 for i in countdown(5): print i, for i in countdown(5): print "T-minus", i for i in countdown(5): os.system("rm img%d.png" % i) Iteration Code block that uses the iteration • We will talk more about this in a later class
  471. Copyright (C) 2007, http://www.dabeaz.com 3- Anonymous Code Blocks • Instead

    of turning iteration into an object, Ruby flips the whole thing around and allows code blocks to be turned into objects. 124 def countdown(n) print "Counting down!\n" while n > 0 yield n n -= 1 end end countdown(5) { |i| puts i } Counting down! 5 4 3 2 1 Code block
  472. Copyright (C) 2007, http://www.dabeaz.com 3- Anonymous Code Blocks • Here,

    a block of code gets packaged into an object 125 countdown(5) { |i| puts i } puts i • That object is then passed into a function as part of the environment
  473. Copyright (C) 2007, http://www.dabeaz.com 3- Anonymous Code Blocks • The

    function runs as normally until the yield statement is reached 126 print "Counting down!" while n > 0 yield n n -= 1 end statements variables 'n' : 5 <block> countdown : environment |i| puts i
  474. Copyright (C) 2007, http://www.dabeaz.com 3- Anonymous Code Blocks • Yield

    then produces a value that's fed into the code block which runs 127 print "Counting down!" while n > 0: yield n n -= 1 end statements variables 'n' : 5 <block> countdown : environment |i| puts i • The code block then executes in the environment where it was defined!
  475. Copyright (C) 2008, http://www.dabeaz.com 3- Anonymous Code Blocks • Once

    the code block runs, you go back to statements in the current function 128 print "Counting down!" while n > 0: yield n n -= 1 end statements variables 'n' : 5 <block> countdown : environment |i| puts i • This continues until no more statements
  476. Copyright (C) 2008, http://www.dabeaz.com 3- Implementation Details • It's important

    to realize that there is an environment switch going on here! sum = 0 countdown(5) { |i| sum += i } • When the code block gets executed, it runs in the outer environment---not in the environment of the countdown function • Note: This is related to "closures" (which we will cover in a few weeks)
  477. Copyright (C) 2008, http://www.dabeaz.com 3- Commentary • These exotic features

    related to iteration are really just fancy tricks involving the execution environment • Generator : A function that can suspend itself and emit a value from its environment • Code block : A chunk of code that you can run, but which executes in the environment where it was defined. 130
  478. Copyright (C) 2008, http://www.dabeaz.com 3- Error Handling • If you

    write a program and it encounters an error, it normally aborts with some kind of traceback 132 >>> prices['SCOX'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'SCOX' >>> • Errors usually have a "type" and some informative diagnostics. • Programs can raise and catch errors
  479. Copyright (C) 2007, http://www.dabeaz.com 3- Exceptions (Python) • Catching an

    exception (try-except) • Raising an exception (raise) raise RuntimeError("Name not found") try: statements except RuntimeError,e: print e 133
  480. Copyright (C) 2007, http://www.dabeaz.com 3- Exceptions (Ruby) • Catching an

    exception (begin - rescue) • Raising an exception (raise) raise RuntimeError,"Name not found" begin statements rescue RuntimeError => e puts e end 134 • Exceptions are very similar in most languages • So, I will focus on Python.
  481. Copyright (C) 2007, http://www.dabeaz.com 3- Exceptions • Exceptions propagate to

    first matching except def foo(): try: bar() except RuntimeError,e: ... def bar(): try: spam() except RuntimeError,e: ... def spam(): grok() def grok(): ... raise RuntimeError("Whoa!") 135
  482. Copyright (C) 2007, http://www.dabeaz.com 3- Exception Types • Exceptions usually

    have some kind of name ArithmeticError AssertionError EnvironmentError EOFError ImportError IndexError KeyboardInterrupt KeyError MemoryError NameError ReferenceError RuntimeError SyntaxError SystemError TypeError ValueError 136 • Consult reference
  483. Copyright (C) 2007, http://www.dabeaz.com 3- Exception Values • Most exceptions

    have an associated value • More information about what's wrong raise RuntimeError("Invalid user name") • Passed to variable supplied in except try: ... except RuntimeError,e: ... • Commonly a string, but may be any object 137
  484. Copyright (C) 2007, http://www.dabeaz.com 3- Catching Multiple Errors • Can

    catch different kinds of exceptions try: ... except LookupError,e: ... except RuntimeError,e: ... except IOError,e: ... except KeyboardInterrupt,e: ... 138 • Just chain together except clauses
  485. Copyright (C) 2007, http://www.dabeaz.com 3- Catching All Errors • Just

    leave off the exception type try: statements except: print "An error occurred" 139 • This isn't the best programming style (can be hellish to debug)
  486. Copyright (C) 2007, http://www.dabeaz.com 3- Reraising an Exception • It

    is usually possible to catch and re-raise the last exception try: ... except: print "An error occurred" raise # Re-raise the last error 140
  487. Copyright (C) 2007, http://www.dabeaz.com 3- finally statement • Specifies code

    that must run regardless of whether or not an exception occurs f = open("foo","r") try: ... finally: f.close() # Close file • Commonly use to properly manage resources (especially locks, files, etc.) • In Ruby, this is the "ensure" clause 141
  488. Copyright (C) 2007, http://www.dabeaz.com 3- Restartable Blocks • It is

    not always possible to retry code that generated an error • For example, in Python, execution always resumes after the try-except block • Ruby has a retry statement begin statements rescue # Determine if we can retry retry end 142
  489. Copyright (C) 2007, http://www.dabeaz.com 3- Executing Code Strings • Since

    most dynamic languages are interpreted, they typically have the ability to run their own code given as a string 144 s = """ for i in range(10): print "i =", i """ exec(s) x = eval("3 + 20/5") • The exact syntax varies
  490. Copyright (C) 2007, http://www.dabeaz.com 3- Executing Code Strings • Executing

    code strings is inherently a frightening concept 145 • The usual semantics is for the code string to execute as if it were typed directly into the program at the point of eval/exec statement • However, it is sometimes possible to execute strings in their own environment
  491. Copyright (C) 2007, http://www.dabeaz.com 3- Executing Code Strings • Executing

    code in a custom environment 146 v = { 'x' : 3, 'y' : 4 } a = eval("x+y",v) --> 7 • There are many opportunities for serious magic here.
  492. Copyright (C) 2007, http://www.dabeaz.com 3- Wrap-up • This section has

    really focused on the structure and control-flow of programs • The really big issues: • Statement execution environment • Global/local environment distinction • What happens during function calls • Will return to more of this later. 148
  493. Copyright (C) 2008, http://www.dabeaz.com 4- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  494. Copyright (C) 2008, http://www.dabeaz.com 4- Introduction 3 • In this

    section, we look at the problem of how to put larger programs together • Organizing programs into modules • Introduction to "objects" • Object oriented programming
  495. Copyright (C) 2008, http://www.dabeaz.com 4- Background 4 • The primary

    focus of this section is not on how to hack code (e.g. loops, variables, functions, algorithms, etc.) • It's more related to software engineering • How do you organize a million line program? • How do you make programs extensible? • How do you make programs maintainable?
  496. Copyright (C) 2008, http://www.dabeaz.com 4- Signs and Portents 5 •

    You have already been working with "objects" import csv reader = csv.reader(open("portfolio.dat")) a = [1,2,3] # A list (object) a.append(42) # Append to a list (a method) • Have probably encountered modules as well • However, you may not have thought much about why objects and modules behave the way that they do.
  497. Copyright (C) 2008, http://www.dabeaz.com 4- Historical Perspective 6 • The

    engineering aspects of developing software have been known for a long time • Currently, most of the work in this area is found under the banner of "Object Oriented Programming" • However, it is a lot like religion. There might be something redeeming about it, but it's not always clear what people are talking about.
  498. Copyright (C) 2008, http://www.dabeaz.com 4- Required Reading 7 • If

    you can own just one book on software engineering... • F. Brooks, "The Mythical Man- Month" • And when you're done, read T. Kidder, "The Soul of a New Machine."
  499. Copyright (C) 2008, http://www.dabeaz.com 4- Our Focus 8 • This

    class isn't about software project management (at least not directly) • Instead, we're going to focus a little bit on how object oriented programming came into existence • A lot of background on why people work with objects to begin with
  500. Copyright (C) 2008, http://www.dabeaz.com 4- An Experiment 9 • We're

    later going to go explore Smalltalk • One of the earliest OO languages • By far, the most influential language on the development of later dynamic languages such as Ruby and Python. • Disclaimer : I might crash and burn with this. (Don't say I didn't warn you).
  501. Copyright (C) 2008, http://www.dabeaz.com 4- What is a Program? 11

    • A program is a series of statements • There are many different types of statements • Assignment, conditions, loops, exception handling, function calls, function definition, etc. • But what happens as a program grows?
  502. Copyright (C) 2008, http://www.dabeaz.com 4- Program Growth 12 • Problem

    : Editing the source code • As a program grows in size, the program will start to become quite large in the editor. • You don't want to edit a 100000 line program that's been typed into one big file • As a practical matter, programmers don't like to edit files that are much longer than a few thousand lines.
  503. Copyright (C) 2008, http://www.dabeaz.com 4- Multiple Files 13 • Large

    programs involve multiple source files • Generally, you break it up by putting related functionality into the same file • However, this introduces a variety of new problems related to file management • Example : Separate compilation in C • Example : Management of the global namespace
  504. Copyright (C) 2008, http://www.dabeaz.com 4- Separate Compilation /* foo.c */

    extern int bar(int); void foo(int n) { ... x = bar(n); ... } 14 • If you split across files, you have to have some way to reference definitions in other locations (e.g., "extern") /* bar.c */ int bar(int n) { ... statements ... } foo.c bar.c
  505. Copyright (C) 2008, http://www.dabeaz.com 4- Global Namespace 15 • In

    addition, there is the question of how global symbols are managed. • For example, do all of the variable and function names have to be distinct? • In C/C++, the answer is generally "yes."
  506. Copyright (C) 2008, http://www.dabeaz.com 4- The Naming Problem 16 •

    If everything exists in the same global space, the problem of picking names becomes increasingly difficult as the program grows • An added problem arises if you start using programming libraries • Those libraries also need to pick unique names
  507. Copyright (C) 2008, http://www.dabeaz.com 4- Name Prefixing 17 • One

    solution : Name prefixing def Foo_bar() ... end def Foo_spam() ... end def Foo_grok() ... end • Group related functionality under a common name prefix (very common in C)
  508. Copyright (C) 2008, http://www.dabeaz.com 4- Other Problems 18 • Extensibility

    : How do you make it easy to extend a program with new features? • Code reuse : How do you make it easy to re- use parts of a program that you have already written? • Modularization : How do you make it easier to divide a program into pieces that many people can work on?
  509. Copyright (C) 2008, http://www.dabeaz.com 4- Extensibility 19 • Suppose you

    have written a big application that has to read data def read_data(source) ... end Mondo Application • And suppose there is a single function that reads input data (e.g., read_data)
  510. Copyright (C) 2008, http://www.dabeaz.com 4- Extensibility Example 20 • Now,

    modify the application to support reading data from the following file formats • CSV files • From a relational database • XML files • Scraped off HTML pages • Excel spreadsheets
  511. Copyright (C) 2008, http://www.dabeaz.com 4- Extensibility Example 21 • To

    implement this, you might implement many different versions of the functions: def read_data_csv(name) ... def read_data_xml(name) ... def read_data_db(name) ... def read_data_html(name) ... • But now, you have the problem of plugging these functions into the larger program.
  512. Copyright (C) 2008, http://www.dabeaz.com 4- Extensibility Example 22 • One

    solution : A dispatch function data_format = "xml" ... def read_data(name) if data_format == 'csv' d = read_data_csv(name) elif data_format == 'xml' d = read_data_xml(name) elif data_format == 'db' d = read_data_db(name) elif data_format == 'html' d = read_data_html(name) • Yow! How is this going to scale up in a huge programming project? (hint : It's not!)
  513. Copyright (C) 2008, http://www.dabeaz.com 4- Code Reuse 23 • Perhaps

    you have implemented some general purpose functionality def read_data_xml(source) ... # General purpose code to parse XML statements ... # Specific statements to process data statements ... • Maybe you want to re-use the more general parts of this code in other places • Question : How?
  514. Copyright (C) 2008, http://www.dabeaz.com 4- Modularization 24 • In large

    projects, there is a benefit to breaking a program up into to small self- contained modules • Each module can be maintained separately • Often by different groups of programmers • To make it work, you really have to think about the boundaries between modules (interfaces, versions, etc.)
  515. Copyright (C) 2008, http://www.dabeaz.com 4- Components 25 • Efforts to

    modularize code typically lead to the use of software "components" • Components are self-contained and have a well-defined programming interface (API) foo API • Applications constructed mostly by assembling and gluing components together
  516. Copyright (C) 2008, http://www.dabeaz.com 4- Components 26 • In the

    real world, software components may be written in entirely different languages • There is a whole industry surrounding the use of components • Example : COM, Active-X, etc. on Windows • This is also a major reason why people are using dynamic languages
  517. Copyright (C) 2008, http://www.dabeaz.com 4- Files 28 • When you

    start working on a program, it often starts out as one source file • However, at some point, it reaches a point where you want to split it into two files • Splitting across files is probably the most fundamental division of source code. • It seems simple enough...
  518. Copyright (C) 2008, http://www.dabeaz.com 4- File Includes 29 • Most

    dynamic languages provide some kind of statement to load statements from another source code file execfile("foo.py") # Python require 'foo.rb' # Ruby require "foo.pl"; # Perl require("foo.php"); # PHP • Examples:
  519. Copyright (C) 2008, http://www.dabeaz.com 4- File Includes 30 • A

    file include typically executes the statements in the file as if they had been typed at the point where the include statement was placed • However, there are still some tricky issues lurking underneath the covers # Here's my big application require 'funcs.src' require 'utils.src' ... statements ...
  520. Copyright (C) 2008, http://www.dabeaz.com 4- Multiple Includes 31 • Can

    a file be included more than once? • require() is often a one-time operation. require 'foo.src' ... statements ... require 'foo.src' # Ignored • The one-time behavior is used to make programming libraries work correctly
  521. Copyright (C) 2008, http://www.dabeaz.com 4- Multiple Includes 32 • Example

    of library dependencies require 'spam.src' ... statements ... require 'spam.src' ... statements ... foo.src bar.src require 'foo.src' require 'bar.src' ... statements ... Don't want this operation to load 'spam.src' a second time (already loaded before)
  522. Copyright (C) 2008, http://www.dabeaz.com 4- Specifying Files 33 • Specifying

    a filename is trickier than it looks require './foo.src' require '/users/beazley/Projects/foo.src' require 'C:\Documents and Settings\Projects\foo.src' • As a general rule, it's bad practice to hard- code path names into a program (especially if it has to be moved around) • Solution : PATH variables • May be platform dependent
  523. Copyright (C) 2008, http://www.dabeaz.com 4- Path Settings 34 • Most

    languages will have some kind of internal variable that contains the list of search directories for file includes • Example : Ruby ($: variable) irb(main):001:0> puts $: /usr/lib/ruby/site_ruby/1.8 /usr/lib/ruby/site_ruby/1.8/powerpc-darwin8.0 /usr/lib/ruby/site_ruby/1.8/universal-darwin8.0 /usr/lib/ruby/site_ruby /usr/lib/ruby/1.8 /usr/lib/ruby/1.8/powerpc-darwin8.0 /usr/lib/ruby/1.8/universal-darwin8.0 . => nil
  524. Copyright (C) 2008, http://www.dabeaz.com 4- Files to Directories 35 •

    As a program continues to grow, you will reach a point where you want to split files across multiple directories Formats/ png.src gif.src jpg.src tiff.src Parsing/ html.src xml.src csv.src
  525. Copyright (C) 2008, http://www.dabeaz.com 4- Packages 36 • A directory

    of related files is sometimes known as a "package" Formats/ png.src gif.src jpg.src tiff.src • To install, you need to put the package directory on the file search path. • But packages don't always play nice...
  526. Copyright (C) 2008, http://www.dabeaz.com 4- Conflicting File Names 37 •

    Consider two packages of source code /Blah/ foo.src bar.src spam.src grok.src /Yow/ bar.src spam.src • What happens if both packages are on the file search path and they include the same filename? # foo.src require 'spam.src' # bar.src require 'spam.src' ??
  527. Copyright (C) 2008, http://www.dabeaz.com 4- Commentary 38 • Virtually every

    dynamic language has implemented file inclusion in some sort of slightly broken way • It's a problem that seems like it should be easy, but which is hard and sneaky • Solutions usually focus on making the loading of files more abstract and high level • Example : Packages in Java, Python, Perl, etc.
  528. Copyright (C) 2008, http://www.dabeaz.com 4- Example 39 • High-level module/package

    import import java.io.*; // Java import os.path // Python use blah; // Perl • Here, the request to "import" code is not directly tied to low-level details concerning the file system • You still worry about configuration, but it's a little more controlled
  529. Copyright (C) 2008, http://www.dabeaz.com 4- The Naming Problem 41 •

    Even if you have multiple files, you still have issues with naming things • Recall that all statements execute inside an environment (that holds variables) • There is usually a global/local environment • Question : Do all files execute in the same global environment?
  530. Copyright (C) 2008, http://www.dabeaz.com 4- Namespaces 42 • A namespace

    is a named environment where program statements can execute • To break up a large program, different parts of the program can execute in different namespaces • This provides isolation between components • Namespace serves as a kind of "module"
  531. Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Example 43 • Consider

    two different sets of statements x = 42 def square(y) return y*y end x = 10 def countdown(n) while n > 0 print "T-minus", n n -= 1 end print "Fizzle" end • If you do nothing, these statements live in the same space
  532. Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Example 44 • Namespaces

    put statements into a named env namespace foo { x = 42 def square(y) return y*y end } namespace bar { x = 10 def countdown(n) while n > 0 print "T-minus", n n -= 1 end print "Fizzle" end } foo bar • Note : Exact syntax varies widely for this
  533. Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Example 45 • Namespaces

    define separate environments foo bar 'x' : 42 'square' : <func> 'x' : 10 'countdown' : <func> namespace foo { x = 42 def square(y) return y*y end } namespace bar { x = 10 def countdown(n) while n > 0 print "T-minus", n n -= 1 end print "Fizzle" end
  534. Copyright (C) 2008, http://www.dabeaz.com 4- Accessing Namespaces 46 • Although

    namespaces are isolated, you still need to access to data/functionality contained in other namespaces • You need an access mechanism to cross module boundaries
  535. Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Access 47 • Namespace

    selection syntax (::) 'x' : 42 'square' : <func> 'x' : 10 'countdown' : <func> foo bar foo::x foo::square bar::x bar::countdown
  536. Copyright (C) 2008, http://www.dabeaz.com 4- Namespace Access 48 • Sometimes

    a namespace is implemented as some kind of data or "object" in the language print foo.x print bar.x bar.countdown(10) a = foo.square(3) • Here, the "namespace" is something you can pass around and treat like data b = bar b.countdown(5)
  537. Copyright (C) 2008, http://www.dabeaz.com 4- Commentary 49 • With namespaces,

    you now have a mechanism for breaking code across files and isolating the execution of code to different environments • This is critical to the development of programming libraries and components • Library builders can isolate their code and have a reasonable assurance that it won't conflict with your code
  538. Copyright (C) 2008, http://www.dabeaz.com 4- Data Structures 51 • Programs

    need to work with data structures • For example, a graphics program might have to work with shapes like Circles and Rectangles. • Each of shape will have some basic attributes struct Circle { double radius; }; struct Rectangle { double width; double height; }; { 'radius' : 4 } { 'width' : 4, 'height' : 5 } C Python
  539. Copyright (C) 2008, http://www.dabeaz.com 4- Methods 52 • In addition,

    there are functions that perform various operations on data • These are typically called "methods" • Some examples for shapes: • Compute the area • Compute the perimeter • Draw on the screen
  540. Copyright (C) 2008, http://www.dabeaz.com 4- Problem 53 • How do

    you bundle data structures and methods together in an effective way? • One approach : Use a namespace • Rationale : Namespaces keep code isolated. So, just put the functionality for each kind of data in a separate namespace.
  541. Copyright (C) 2008, http://www.dabeaz.com 4- Example : A Circle 54

    • Here is some code for a Circle (Python) # Circle.py import math def new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius']
  542. Copyright (C) 2008, http://www.dabeaz.com 4- # Circle.py import math def

    new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] Example : A Circle 55 • Here is some code for a Circle (Python) The namespace "Circle"
  543. Copyright (C) 2008, http://www.dabeaz.com 4- # Circle.py import math def

    new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] Example : A Circle 56 • Here is some code for a Circle (Python) Create a container where we will store data related to the circle
  544. Copyright (C) 2008, http://www.dabeaz.com 4- # Circle.py import math def

    new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] Example : A Circle 57 • Here is some code for a Circle (Python) Initialize a circle by storing some data (the radius) inside the container
  545. Copyright (C) 2008, http://www.dabeaz.com 4- # Circle.py import math def

    new(): c = { } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] Example : A Circle 58 • Here is some code for a Circle (Python) Perform some kind of operation on a Circle
  546. Copyright (C) 2008, http://www.dabeaz.com 4- Example : Circle 59 •

    Here's how you would use the Circle >>> import Circle >>> c = Circle.new() >>> Circle.init(c,4) >>> Circle.area(c) 50.26548245743669 >>> Circle.perimeter(c) 25.132741228718345 >>> • Notice how the namespace (Circle) is encapsulating all of the functionality related to circles
  547. Copyright (C) 2008, http://www.dabeaz.com 4- Example : A Rectangle 60

    • Here is similar code for a Rectangle # Rectangle.py def new(): r = { } return r def init(r,width,height): r['width'] = width r['height'] = height def area(r): return r['width']*r['height'] def perimeter(r): return 2*(r['width']+r['height'])
  548. Copyright (C) 2008, http://www.dabeaz.com 4- Example : Shapes 61 •

    Example use >>> import Circle >>> import Rectangle >>> c = Circle.new() >>> Circle.init(c,4) >>> r = Rectangle.new() >>> Rectangle.init(r,4,5) >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> Circle.perimeter(c) 25.132741228718345 >>> Rectangle.perimeter(r) 18 >>>
  549. Copyright (C) 2008, http://www.dabeaz.com 4- Example : Shapes 62 •

    Example use The code for each shape is isolated in a separate module (namespace) >>> import Circle >>> import Rectangle >>> c = Circle.new() >>> Circle.init(c,4) >>> r = Rectangle.new() >>> Rectangle.init(r,4,5) >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> Circle.perimeter(c) 25.132741228718345 >>> Rectangle.perimeter(r) 18 >>>
  550. Copyright (C) 2008, http://www.dabeaz.com 4- >>> import Circle >>> import

    Rectangle >>> c = Circle.new() >>> Circle.init(c,4) >>> r = Rectangle.new() >>> Rectangle.init(r,4,5) >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> Circle.perimeter(c) 25.132741228718345 >>> Rectangle.perimeter(r) 18 >>> Example : Shapes 63 • Example use Here, we are creating and initializing some shapes (which are just dictionaries)
  551. Copyright (C) 2008, http://www.dabeaz.com 4- >>> import Circle >>> import

    Rectangle >>> c = Circle.new() >>> Circle.init(c,4) >>> r = Rectangle.new() >>> Rectangle.init(r,4,5) >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> Circle.perimeter(c) 25.132741228718345 >>> Rectangle.perimeter(r) 18 >>> Example : Shapes 64 • Example use Performing various operations on the shapes
  552. Copyright (C) 2008, http://www.dabeaz.com 4- Problem 65 • The code

    "works," but you have to be very specific about the methods you call. >>> Circle.area(c) 50.26548245743669 >>> Rectangle.area(r) 20 >>> • There is another issue: How do you know what kind of shape you have? s = ... # A shape of some kind # Calculate the area area = ????? # What do you do here?
  553. Copyright (C) 2008, http://www.dabeaz.com 4- Adding a "class" 66 •

    In order to distinguish different kinds of data, you can tag it with some kind of "class" # Circle.py import math def new(): c = { 'class' : 'Circle' } return c def init(c,radius): c['radius'] = radius def area(c): return math.pi*c['radius']**2 def perimeter(c): return 2*math.pi*c['radius'] An attribute that says what the data actually is
  554. Copyright (C) 2008, http://www.dabeaz.com 4- Adding a "class" 67 •

    Example : A Rectangle # Rectangle.py def new(): r = { 'class' : 'Rectangle'} return r def init(r,width,height): r['width'] = width r['height'] = height def area(r): return r['width']*r['height'] def perimeter(r): return 2*(r['width']+r['height'])
  555. Copyright (C) 2008, http://www.dabeaz.com 4- Method Dispatch 68 • Once

    data is tagged with a "class", you can create a high-level "dispatch function" def area(shape): if shape['class'] == 'Circle': return Circle.area(shape) elif shape['class'] == 'Rectange': return Rectangle.area(shape) ... # Usage c = Circle.new(4) s = Rectangle.new(4,5) print area(c) # Calls Circle.area print area(s) # Calls Rectangle.area
  556. Copyright (C) 2008, http://www.dabeaz.com 4- A New Problem 69 •

    What if you wanted all shapes to have position information and some functions for movement? • Example : x,y coordinates and a function for moving the shape. • One approach : Modify every source file involving shapes... ugh.
  557. Copyright (C) 2008, http://www.dabeaz.com 4- Adding Positions 70 # Circle.py

    import math def new(): c = { 'class' : 'Circle' } return c def init(c,radius): c['radius'] = radius c['x'] = 0 c['y'] = 0 def move(c,dx,dy): c['x'] += dx c['y'] += dy def area(c): return math.pi*c['radius']**2 ... Adding new features to the Circle.
  558. Copyright (C) 2008, http://www.dabeaz.com 4- Adding Positions 71 # Rectangle.py

    def new(): r = { 'class' : 'Rectangle' } return r def init(r,width,height): r['width'] = width r['height'] = height r['x'] = 0 r['y'] = 0 def move(r,dx,dy): r['x'] += dx r['y'] += dy def area(r): return r['width']*r['height'] ... Adding new features to the Rectangle
  559. Copyright (C) 2008, http://www.dabeaz.com 4- Commentary 72 • Modifying code

    in every shape sucks! • Can't we put that code in one place and use it for all of the shapes? • Sure, just put it in a different namespace
  560. Copyright (C) 2008, http://www.dabeaz.com 4- Shape Module 73 • General

    purpose shape functions # Shape.py def init(shape): shape['x'] = 0 shape['y'] = 0 def move(shape,dx,dy): shape['x'] += dx shape['y'] += dy • All of this code is general (not shape specific)
  561. Copyright (C) 2008, http://www.dabeaz.com 4- A Circle 74 # Circle.py

    import Shape import math def new(): c = { 'class' : 'Circle' } return c def init(c,radius): Shape.init(c) # Set the positions c['radius'] = radius move = Shape.move # Copy move here def area(c): return math.pi*c['radius']**2 ...
  562. Copyright (C) 2008, http://www.dabeaz.com 4- A Rectangle 75 # Rectangle.py

    import Shape def new(): r = { 'class' : 'Rectangle' } return r def init(r,width,height): Shape.init(r) # Set the positions r['width'] = width r['height'] = height move = Shape.move # Copy move here def area(r): return r['width']*r['height'] ...
  563. Copyright (C) 2008, http://www.dabeaz.com 4- Example Usage 76 >>> import

    Circle >>> c = Circle.new() >>> Circle.init(c,4) >>> c['x'] 0 >>> Circle.move(c,3,7) >>> c['x'] 3 >>> Circle.area(c) 50.26548245743669 >>> Notice how Circles picked up the functionality we defined in Shape
  564. Copyright (C) 2008, http://www.dabeaz.com 4- A Subtle Detail 77 •

    Since Circle and Rectangles are using common functionality from Shape, we should probably make sure that both objects get created in a consistent way def new(): c = { 'class' : 'Circle' } return c def new(): r = { 'class' : 'Rectangle' } return r
  565. Copyright (C) 2008, http://www.dabeaz.com 4- Creating Shapes 78 # Shape.py

    def new(classname="Shape"): s = { 'class' : classname } return s # Circle.py def new(classname="Circle"): return Shape.new(classname) # Rectangle.py def new(classname="Rectangle"): return Shape.new(classname)
  566. Copyright (C) 2008, http://www.dabeaz.com 4- A New Problem 79 •

    What if you wanted make a new kind of circle with just slight modifications? • Example : A circle with a color added to it
  567. Copyright (C) 2008, http://www.dabeaz.com 4- A Colored Circle 80 #

    ColoredCircle.py import Circle def new(classname="ColoredCircle"): c = Circle.new(classname) return c def init(c,color,radius): Circle.init(c,radius) c['color'] = color # Add a color value # Just use the same functions for area/perimeter area = Circle.area perimeter = Circle.perimeter
  568. Copyright (C) 2008, http://www.dabeaz.com 4- Terminology 81 • Specializing an

    existing object with new attributes or methods is called "inheritance" • You're "inheriting" all of the features of the original object, but making modifications
  569. Copyright (C) 2008, http://www.dabeaz.com 4- Function Dispatch 82 • We've

    set up a lot of machinery, but the problem of method dispatch is still horrible • Here's an example of what's wrong: import Rectangle, Circle, ColoredCircle # Create some shapes a = Rectangle.new(); Rectangle.init(r,4,5) b = Circle.new(); Circle.init(c,4) c = ColoredCircle.new(); ColoredCircle.init(c,"red",5) shapes = [a,b,c] for s in shapes: print area(s) # Compute area of whatever Not quiet sure what to do here (depends on the shape)
  570. Copyright (C) 2008, http://www.dabeaz.com 4- Function Dispatch 83 • You

    can implement a sort of "hack" import sys def dispatch(s,name): classname = s['class'] module = sys.modules[classname] return getattr(module,name) • Example: shapes = [a,b,c] for s in shapes: print dispatch(s,"area")() • This looks up a method based on the classname
  571. Copyright (C) 2008, http://www.dabeaz.com 4- Where is This Going? 84

    • By now, it should be pretty clear • The code we have been writing has been building towards the concept of an "object" • Roughly speaking, an "object" is a way of packaging data and functions together • It ties most of what we just did together
  572. Copyright (C) 2008, http://www.dabeaz.com 4- The important bits 85 •

    The container used to hold object data is called an "instance." The data stored inside is called "instance data." • The namespace where all of the methods are defined is called a "class" • Borrowing methods from other classes is called "inheritance." • Dispatching is called "polymorphism"
  573. Copyright (C) 2008, http://www.dabeaz.com 4- Historical Perspective 86 • Programmers

    have been writing programs that do these sorts of things for a long time • For example, you can implement all of this in C or other simple languages • However, it's usually really clunky, verbose, and really hard to maintain
  574. Copyright (C) 2008, http://www.dabeaz.com 4- The Missing Link 87 •

    An "object oriented language" makes it a lot easier by taking care of low-level details • There is special syntax and other features >>> c = Circle(4.0) >>> r = Rectangle(4,5) >>> c.area() 50.26548245743669 >>> r.area() 20 >>> • So, let's talk about that...
  575. Copyright (C) 2008, http://www.dabeaz.com 4- A Bit of History :

    Simula 89 • The first "object oriented" programming language was Simula. • Simula was largely based on adding support for "objects" to Algol-60. • Strongly based on static compilers • Most of the core ideas in Simula later re- surfaced in C++ and by extension in Java.
  576. Copyright (C) 2008, http://www.dabeaz.com 4- History : Smalltalk 90 •

    Smalltalk was also one of the first object- oriented programming languages • Initially developed at Xerox PARC (~1971) • Smalltalk-80 was first public release • Unlike Simula, it was a dynamic language (!)
  577. Copyright (C) 2008, http://www.dabeaz.com 4- Historical Quote 91 One day,

    in a typical PARC hallway bullsession, Ted Kaeher, Dan Ingalls, and I were standing around talking about programming languages. The subject of power came up and the two of them wondered how large a language one would have to make to get great power. With as much panache as I could muster, I asserted that you could define the "most powerful language in the world" in "a page of code." They said, "Put up or shut up." - Alan Kay, "The Early History of Smalltalk"
  578. Copyright (C) 2008, http://www.dabeaz.com 4- The Big Picture 92 •

    Almost all modern dynamic languages cite Smalltalk as an "influence" "The idea of [....] comes from Smalltalk" • I'll be honest, I've never written a single program in Smalltalk before this lecture. • But, what in the heck does it mean to be "influenced" by Smalltalk? • Let's go find out...
  579. Copyright (C) 2008, http://www.dabeaz.com 4- Smalltalk in Three Bullets 94

    • Everything in Smalltalk is an "object" • Objects hold state (data) • An object sends messages to and receives messages from other objects (or itself) ("That's all folks.")
  580. Copyright (C) 2008, http://www.dabeaz.com 4- Smalltalk as Text 95 •

    It is possible (but unusual) to program Smalltalk as a text-based language • For the examples that follow, I am using GNU smalltalk 3.0 • Fine for illustrating the general idea
  581. Copyright (C) 2008, http://www.dabeaz.com 4- Example : Integers 96 •

    Creating an object (an integer) x := 5 • The data stored by that object is the value (5) • The object has an associated class that indicates what kind of object it is st> x class SmallInteger st> • The object is called an "instance"
  582. Copyright (C) 2008, http://www.dabeaz.com 4- Object Hierarchy 97 • Classes

    are organized into a hierarchy Object Magnitude Number Integer Float SmallInteger LargeInteger Collection
  583. Copyright (C) 2008, http://www.dabeaz.com 4- Messages 98 • Once you

    have created an object, there is only one thing you do with it • You can send it a message. • That's it. • Nothing else. • Thus ends our tutorial of Smalltalk....
  584. Copyright (C) 2008, http://www.dabeaz.com 4- Messages 99 • Messages have

    two components selector parameter (opt) • It is first delivered to the object's class Object Magnitude Number Integer SmallInteger x := 5 Instance of SmallInteger selector | parm Message
  585. Copyright (C) 2008, http://www.dabeaz.com 4- Messages 100 • If not

    handled, it propagates to the superclass • This is an example of "inheritance" Object Magnitude Number Integer SmallInteger x := 5 Instance of SmallInteger selector | parm Message
  586. Copyright (C) 2008, http://www.dabeaz.com 4- Messages 101 • The message

    will propagate up the class hierarchy until a matching selector is found • At this point, the message is handled. Object Magnitude Number Integer SmallInteger x := 5 Instance of SmallInteger selector | parm Message selector Message Handler code
  587. Copyright (C) 2008, http://www.dabeaz.com 4- Message Example 102 • How

    do you send a message? • Here's is an example: st> x := 5. 5 st> x factorial. 120 st> x abs. 5 st> • Is this case, we are sending a simple "unary" message (just a selector, no parameters) The object The message
  588. Copyright (C) 2008, http://www.dabeaz.com 4- More Messages 103 • Binary

    messages (+, -, /, *, etc.) • These take another object as a parameter st> x := 5. 5 st> x + 3. 8 st> x * 4. 20 st> • Here, the operator (+,*) is the selector and the value on the right is the parameter The message
  589. Copyright (C) 2008, http://www.dabeaz.com 4- Keyword Messages 104 • Named

    messages with parameters st> x := 5. 5 st> x raisedTo: 2 25 st> • Here, 'raisedTo:' is the selector and 2 is a parameter
  590. Copyright (C) 2008, http://www.dabeaz.com 4- Smalltalk is Different 105 •

    You'll now notice some pretty odd things st> x := 5 5 st> x + 3 * 4. 32 st> • There are no "operators" in Smalltalk, just messages which usually bind left to right ???? (x + 3) * 4. -> Send "+ 3" to x. Produces 8 8 * 4. -> Send "* 4" to 8. Produces 32
  591. Copyright (C) 2008, http://www.dabeaz.com 4- Message Precedence 106 • The

    three types of messages bind in this order • Unary messages • Binary messages • Keyword messages • Example: st> x := 5. 5 st> 3 * x raisedTo: 2 + 4. 11390625 st> (3 * x) raisedTo: (2 + 4) 15 raisedTo: 6 11390625
  592. Copyright (C) 2008, http://www.dabeaz.com 4- Datatypes 107 • Smalltalk has

    a useful set of datatypes • In fact, they mirror what you have seen so far. • Numbers, strings, arrays, dictionaries, etc.
  593. Copyright (C) 2008, http://www.dabeaz.com 4- Primitive Types 108 • Numbers

    x := 3. Integer x := 3.14159. Floating point x := 3/4. Fraction • Strings x := 'Hello World'. • Characters x := $a. The letter 'a'
  594. Copyright (C) 2008, http://www.dabeaz.com 4- Example : A "List" 109

    • An ordered collection of items items := OrderedCollection new. • Adding some items items add: 3 items add: 'Hello' items add: 7 items addFirst: 'Hey' items addLast: 'Foo' • Pulling out an item x := items at: 2 • Reassignment items at: 2 put: 'Yow!'
  595. Copyright (C) 2008, http://www.dabeaz.com 4- OrderedCollection 110 • Getting the

    size items size • Removing at an index items removeAtIndex: 2 • Remove by searching • Concatenation (,) x := items , otheritems items remove: 'Hello'
  596. Copyright (C) 2008, http://www.dabeaz.com 4- Dictionaries 111 • Creating a

    dictionary items := Dictionary new. • Inserting items at: 'name' put: 'Dave' items at: 3 put: 4 • Getting items • Removing items removeKey: 'name'' items at: 'name'
  597. Copyright (C) 2008, http://www.dabeaz.com 4- Some Useful Things 112 •

    Displaying an object object display • Printing with a newline object printNl (note ends with lower-case 'L') • Inspecting an object (debugging) object inspect
  598. Copyright (C) 2008, http://www.dabeaz.com 4- Control Flow 113 • Remember,

    I said that Smalltalk only has objects and messages • THAT'S IT! • There are no "control flow" statements • No conditional statements • No looping statements • No function statements
  599. Copyright (C) 2008, http://www.dabeaz.com 4- Code Blocks 114 • Blocks

    of code are objects st> a := [ x := 3. y := 4. x + y ]. st> a a BlockClosure st> a value 7 st> An object holding the code A message telling the code block to run and produce a value
  600. Copyright (C) 2008, http://www.dabeaz.com 4- Code Blocks 115 • A

    Code block can optionally take parameters st> b := [ :x :y | x + y ]. st> b value: 3 value : 4 7 st> • This gives you something that roughly looks like a function • But it's still just an object. You send it messages to get it to run.
  601. Copyright (C) 2008, http://www.dabeaz.com 4- Conditionals 116 • There are

    no conditional statements. Instead, you send a message to a boolean object st> x := 3. 3 st> y := 4. 4 st> (x < y) ifTrue: [ ... code block ... ] ifFalse: [ ... code block ... ]. st> • The message parameters are code blocks
  602. Copyright (C) 2008, http://www.dabeaz.com 4- Loops 117 • Loops are

    also messages involving code blocks st> x := 0. st> 3 timesRepeat: [x := x + 1]. 3 st> [ x < 100 ] whileTrue: [x := x + 1]. nil st> x 100 st> • It's a message with code blocks as parameters
  603. Copyright (C) 2008, http://www.dabeaz.com 4- Example : Dave's Mortgage 118

    principle := 500000. rate := 0.04. payment := 499.0. month := 0. total_paid := 0. [principle > 0 ] whileTrue: [ principle := principle*(1+(rate/12)) - payment. total_paid := total_paid + payment. month := month + 1. (month == 24) ifTrue: [ rate := 0.09. payment := 3999. ] ] 'Total paid ' display. total_paid printNl. 'Months' display. month printNl.
  604. Copyright (C) 2008, http://www.dabeaz.com 4- Iterating over Data 119 •

    Here's how you loop over a collection st> x := #(1 4 5 10 20). (1 4 5 10 20) st> x do: [:item | item printNl ]. 1 4 5 10 20 (1 4 5 10 20 st> • More code blocks
  605. Copyright (C) 2008, http://www.dabeaz.com 4- Classes 120 • Everything in

    Smalltalk is an object • You create your own objects by defining a class. • However, there is no special "class" statement. • Instead, you send a message
  606. Copyright (C) 2008, http://www.dabeaz.com 4- Defining a Class 121 •

    To create a new class, you send a message to the parent class (the superclass) • If you don't know what the parent is, you send a message to Object. The root of all objects. st> Object subclass: #Shape. Shape st> • Here, we are asking Object to create a subclass called "Shape."
  607. Copyright (C) 2008, http://www.dabeaz.com 4- Instance Data 122 • Objects

    have internal data • The members are set up in the class. st> Shape instanceVariableNames: 'x y'. Shape st> • This operation sets the names of instance variables of Shape objects • Enforces that Shapes will have x and y.
  608. Copyright (C) 2008, http://www.dabeaz.com 4- Creating Shapes 123 • To

    create a shape, you have to define new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself
  609. Copyright (C) 2008, http://www.dabeaz.com 4- Creating Shapes 124 • To

    create a shape, you have to define new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself A local variable
  610. Copyright (C) 2008, http://www.dabeaz.com 4- Creating Shapes 125 • To

    create a shape, you have to define new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself Sends 'new' to the parent class. The "parent" (superclass)
  611. Copyright (C) 2008, http://www.dabeaz.com 4- Creating Shapes 126 • To

    create a shape, you have to define new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself Send the 'init' message to the newly created instance
  612. Copyright (C) 2008, http://www.dabeaz.com 4- Creating Shapes 127 • To

    create a shape, you have to define new Shape class extend [ new [ | s | s := super new. s init. ^s ] ] • This is an example of a "class method" • It is a message that is sent to the class itself Return the instance
  613. Copyright (C) 2008, http://www.dabeaz.com 4- Example of Creating 128 •

    Here's some sample output st> s := Shape new. Object: Shape new "<0x40292bb0>" error: did not understand #init ... st> • It didn't work because we didn't define init yet
  614. Copyright (C) 2008, http://www.dabeaz.com 4- Initializing Shapes 129 • Here's

    a definition of init Shape extend [ init [ x := 0. y := 0. ] ] • This just sets up the instance variables st> s := Shape new. a Shape st>
  615. Copyright (C) 2008, http://www.dabeaz.com 4- Instances 130 • Every object

    you create is called an "instance" • All of the internal data is completely private • There is no way to inspect it from outside • To do anything, you make the object respond to messages (by implementing methods)
  616. Copyright (C) 2008, http://www.dabeaz.com 4- Viewing Attributes 131 • Create

    methods that return internal state Shape extend [ x [ ^x. ] y [ ^y. ] ] • These respond to messages st> s x. 0 st> s y. 0 st>
  617. Copyright (C) 2008, http://www.dabeaz.com 4- Methods with Parameters 132 •

    Let's make a shape move Shape extend [ movex: dx [ x := x + dx. ] movey: dy [ y := y + dy. ] ] • These also correspond to messages st> s movex: 3. a Shape st> s movey: 2. a Shape st> s x. 3 st> s y. 2 st>
  618. Copyright (C) 2008, http://www.dabeaz.com 4- Messages to Self 133 •

    Here's a method that sends some messages Shape extend [ movene: distance [ self movex: distance. self movey: distance. ] ] • Example use: st> s movene: 4. a Shape st> s x. 4 st> s y. 4 st> self is the instance
  619. Copyright (C) 2008, http://www.dabeaz.com 4- Making a Circle 134 Shape

    subclass: #Circle. Circle instanceVariableNames: 'radius'. Circle class extend [ new: radius [ |c| c := super new. c init: radius. ^c ] ] Circle extend [ init: rad [ radius := rad. ] area [ |a| a := 3.1415926*(radius raisedTo: 2). ^a ] ]
  620. Copyright (C) 2008, http://www.dabeaz.com 4- Using a Circle 135 st>

    c := Circle new: 4. a Circle st> c area 50.26548 st> c x 0 st> c movex: 3. st> c x 3 st> • Notice how it responds to Shape messages
  621. Copyright (C) 2008, http://www.dabeaz.com 4- Code Block Example 136 Shape

    extend [ movex: dx count: n action: code [ n timesRepeat: [ self movex: dx. code value: self. ] ] ] • Example use: st> c movex: 2 count: 5 action: [:shp |shp x printNl. ] 0 2 4 6 8 st>
  622. Copyright (C) 2008, http://www.dabeaz.com 4- Class Variables/Methods 137 • Almost

    everything we have been doing has focused on instances. • However, the class itself is an object • The class can have its own variables (called class variables) • A class can have its own methods (called class methods)
  623. Copyright (C) 2008, http://www.dabeaz.com 4- Class Method Example 138 •

    Here's a sample definition Shape class extend [ foo [ 'Hello World' printNl. ] ] • Here's a use st> s := Shape new. a Shape. st> s foo. Object: Shape new "<0x402960b8>" did not understand #foo st> Shape foo. Hello World st> A method on the class
  624. Copyright (C) 2008, http://www.dabeaz.com 4- Class Variables 139 • Set

    up when the class is created Object subclass: #Shape instanceVariableNames: 'x y' classVariableNames: 'ncreate' poolDictionaries: '' category: nil ! • Can be accessed in class methods Shape class extend [ ncreate [^ncreate] new [ |s| s := super new. s init. (ncreate = nil) ifTrue: [ncreate := 1] ifFalse: [ncreate := ncreate + 1]. ^s.] ]
  625. Copyright (C) 2008, http://www.dabeaz.com 4- Class Variables 140 • Example

    use: st> s := Shape new. st> s ncreate. Object: Shape new "<0x402988d8>" error: did not understand #ncreate st> Shape ncreate. 1 st> • Again: Notice that it's part of the class
  626. Copyright (C) 2008, http://www.dabeaz.com 4- Interesting Stuff 141 • All

    objects in Smalltalk are "open" • You can add and modify methods of both instances and classes at any time. • Essentially, you can make an instances respond to new kinds of messages at will (even after creation) • On the other hand, you can't really add new instance variables after creation.
  627. Copyright (C) 2008, http://www.dabeaz.com 4- Interesting Stuff 142 • The

    Smalltalk environment itself is an object • It turns out that the assignment operator (:=) is actually a message as well. st> Smalltalk at: #x put: 42. 42 st> x 42 st> • The whole language is objects and messages.
  628. Copyright (C) 2008, http://www.dabeaz.com 4- Smalltalk Wrap-up 143 • Smalltalk

    has been hugely influential • GUIs • Graphical IDEs • Object Oriented Concepts • Object implementation in Dynamic Languages
  629. Copyright (C) 2008, http://www.dabeaz.com 4- Wrap-up 145 • Have built-up

    some of the basic parts of how big programs get put together. • Next time, we'll look at specifics of current Dynamic languages
  630. Copyright (C) 2008, http://www.dabeaz.com 5- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  631. Copyright (C) 2008, http://www.dabeaz.com 5- Introduction 3 • Last time,

    we looked at problems related to creating large programs • Took a detour to go look at Smalltalk, one of the first, and most influential object oriented languages • Today, we're going to look at how all of this gets put together in modern languages
  632. Copyright (C) 2008, http://www.dabeaz.com 5- Overview 4 • A brief

    review of concepts • The Ruby object model (in depth) • The Python object model (in depth) • The Perl object model (brief survey) • The Javascript object model (brief survey)
  633. Copyright (C) 2008, http://www.dabeaz.com 5- What is an Object? 6

    • An object is a programming abstraction that bundles two things together • Data • Methods that operate on the data • For example : A Circle • Data : The radius • Methods : area(), perimeter(), etc.
  634. Copyright (C) 2008, http://www.dabeaz.com 5- Instances 7 • When you

    create objects, you are creating "instances" • Each instance of an object has its own internal data (instance variables) • Examples : Instances of circles .radius=6 .radius=3 .radius=4 .radius=9
  635. Copyright (C) 2008, http://www.dabeaz.com 5- Instance Methods 8 • Functions

    that operate on instances of objects are known as "instance methods" • For example: Compute the area of a circle • The result depends on the circle instance that you supply to the method
  636. Copyright (C) 2008, http://www.dabeaz.com 5- Classes 9 • Instance methods

    are not stored as part of the instances themselves. • They are found in an associated class • Instances are always linked back to a class class Circle area() perimeter() ... .radius=6 .radius=3 .radius=4 Circle instances
  637. Copyright (C) 2008, http://www.dabeaz.com 5- Class Variables 10 • A

    class may define its own variables known as "class variables" • These variables act as a kind of "global variable" for all everything in the class class Circle area() perimeter() ... ncircles = 3 .radius=6 .radius=3 .radius=4 Circle instances class variable
  638. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 11 • A class

    can inherit from other classes • Each class has a link to its superclass (parent) class Circle area() perimeter() ... class Shape move() ...
  639. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 12 • The whole

    point of inheritance is to borrow or modify existing functionality • For example, a Circle picks up all of the functionality that was defined for shapes • And it can modify that functionality if it wants
  640. Copyright (C) 2008, http://www.dabeaz.com 5- Classes as Objects 13 •

    In some languages, classes themselves are an object (an instance of a "class") • The "data" stored in a class object consists of the instance methods and class variables class Circle area() perimeter() class Shape move() class Stock sell() cost() Instances of "classes"
  641. Copyright (C) 2008, http://www.dabeaz.com 5- Class Methods 14 • A

    class may define methods that operate on the class itself (as an object) • These are known as "class methods" • An example : the new method • New is a class method that asks the class to create a new instance
  642. Copyright (C) 2008, http://www.dabeaz.com 5- Static Methods 15 • A

    normal function that just happens to be placed in a class for the purposes of packaging • It has no relation to instances or classes • It's just placed into the class namespace • This is more of a C++/Java oddity
  643. Copyright (C) 2008, http://www.dabeaz.com 5- Objects in Ruby 17 •

    Ruby is object oriented "I wanted a scripting language that was more powerful than Perl, and more object-oriented than Python. That’s why I decided to design my own language (Ruby).” - Matz (creator of Ruby) • What it really means : Matz likes Smalltalk.
  644. Copyright (C) 2008, http://www.dabeaz.com 5- Everything is an Object 18

    • Objects in Ruby are organized in a hierarchy Object Numeric Integer FixNum Float BigNum String ... • At the top, you have "Object"
  645. Copyright (C) 2008, http://www.dabeaz.com 5- Everything is an Object 19

    • How to navigate the hierarchy x.class # The class to which x belongs cls.superclass # The superclass of a class cls • Example: irb(main):025:0> x = 37 => 37 irb(main):026:0> x.class => Fixnum irb(main):027:0> Fixnum.superclass => Integer irb(main):028:0> Integer.superclass => Numeric irb(main):029:0> Numeric.superclass => Object irb(main):030:0>
  646. Copyright (C) 2008, http://www.dabeaz.com 5- Objects Have Methods 20 •

    Example : Integers irb(main):040:0> x = 37 => 37 irb(main):041:0> x.abs => 37 irb(main):042:0> x.between?(2,100) => true irb(main):043:0> x.to_s => "37" irb(main):044:0> x.remainder 20 => 17 irb(main):045:0>
  647. Copyright (C) 2008, http://www.dabeaz.com 5- Inspecting Methods 21 • Get

    a list of method names x.methods • Example: irb(main):025:0> x = 37 => 37 irb(main):026:0> x.methods => ["method", "%", "between?", "send", "<<", "prec", "modulo", "&", "object_id", ">>", "zero?", "size", "singleton_methods", "__send__", "equal?", "taint", "id2name", "*", "next", "frozen?", "instance_variable_get", "+", "kind_of?", "step", "to_a", "instance_eval", "-", "remainder", ...]
  648. Copyright (C) 2008, http://www.dabeaz.com 5- Defining New Objects 22 •

    To create an object, you first define a class class Circle def initialize(radius) @radius = radius end def area Math::PI * @radius ** 2 end def perimeter 2*Math::PI * @radius end end • A class is mainly just a collection of methods
  649. Copyright (C) 2008, http://www.dabeaz.com 5- Instance Variables 23 • Instance

    variables are denoted by @varname class Circle def initialize(radius) @radius = radius end def area Math::PI * @radius ** 2 end def perimeter 2*Math::PI * @radius end end Instance variables • These variables are storing the data that is unique to each instance that is created
  650. Copyright (C) 2008, http://www.dabeaz.com 5- Initialization 24 • initialize is

    called when an object is created class Circle def initialize(radius) @radius = radius end def area Math::PI * @radius ** 2 end def perimeter 2*Math::PI * @radius end end • This name of this method is "special". Ruby expects initialization to use this specific name.
  651. Copyright (C) 2008, http://www.dabeaz.com 5- Creating Instances 25 • To

    create instances, you use new c = Circle.new(4) d = Circle.new(9) ... • This calls initialize with the supplied argument c = Circle.new(4) class Circle def initialize(radius) @radius = radius end ...
  652. Copyright (C) 2008, http://www.dabeaz.com 5- Calling Methods 26 • To

    call methods, you just need an instance c = Circle.new(4) d = Circle.new(9) ... puts c.area # Calls the area method on c puts d.area # Calls the area method on d puts c.perimeter • Inside methods, the instance variables (@vars) bind to the values stored in the instance.
  653. Copyright (C) 2008, http://www.dabeaz.com 5- Inspecting Objects 27 • As

    a debugging aid, you can inspect objects c = Circle.new(4) ... puts c.inspect • Generates a string showing what's inside #<Circle:0x25170 @radius=4> • No way to directly access the internals however (more in a minute)
  654. Copyright (C) 2008, http://www.dabeaz.com 5- Instance Data 28 • Important

    point : The set of instance variables on an object is not restricted or declared • Whenever a method assigns to @varname, that creates a new instance variable class Circle <Shape def initialize(radius) @radius = radius end ... def set_color(color) @color=color end end This "spontaneously" creates a new instance variable when called the first time
  655. Copyright (C) 2008, http://www.dabeaz.com 5- Instance Data 29 • All

    instance variables in Ruby are private • The only way to access is through methods class Circle def initialize(radius) @radius = radius end def radius @radius end def radius=(value) @radius=value end end Return the value Set the value
  656. Copyright (C) 2008, http://www.dabeaz.com 5- Using Accessors 30 • Example

    of using the accessor methods c = Circle.new(4) puts c.radius # Prints 4 c.radius=5 puts c.area # Prints 78.5398163397448 Both of these operations are actually method calls • Important point : There is never direct access to instance data in Ruby. It's always a method.
  657. Copyright (C) 2008, http://www.dabeaz.com 5- Variables vs. Methods 31 •

    Instance variables and methods are separate • Plus, there is special syntax to distinguish instance variables from methods (@varname) • So, it does not matter that there is an instance variable called @radius and a method called radius.
  658. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 32 • A class

    may inherit from one other class class Shape def initialize @x = 0 @y = 0 end def move(dx,dy) @x += dx @y += dy end end class Circle <Shape ... end superclass
  659. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 33 • If no

    superclass is listed, Object is assumed class Shape ... end class Shape <Object ... end == • Ruby only supports single inheritance. • So, there is always just one superclass
  660. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance & Initialization 34 •

    Derived classes must initialize parents • This is done using "super" class Shape def initialize @x = 0 @y = 0 end end class Circle <Shape def initialize(radius) super() # Initialize parent @radius = radius end end
  661. Copyright (C) 2008, http://www.dabeaz.com 5- Super 35 • Within any

    method, super is a special keyword that refers to the same method in the parent class (superclass) • This is used when you re-implement a method, but still want to call the original version within the new method
  662. Copyright (C) 2008, http://www.dabeaz.com 5- Some Shortcuts 36 • Instance

    data always has to be accessed through methods, but defining those methods repeatedly gets tedious and annoying • Here is a shortcut class Shape attr_reader :x, :y def initialize @x = 0 @y = 0 end end This creates accessor methods for reading the values def x @x end def y @y end
  663. Copyright (C) 2008, http://www.dabeaz.com 5- Some Shortcuts 37 • Creating

    attribute writers class Shape attr_reader :x, :y attr_writer :x, :y def initialize @x = 0 @y = 0 end end This creates accessor methods for writing the values def x=(value) @x=value end def y=(value) @y=value end • Note - :x is the Ruby syntax for a symbol
  664. Copyright (C) 2008, http://www.dabeaz.com 5- Concept of Attributes 38 •

    All access to an object occurs via methods • However, methods that take no arguments also look like data "attributes" class Circle <Shape ... def radius @radius end def area Math::PI*@radius **2 end def perimeter 2*Math::PI*@radius end end c = Circle.new(4) puts c.radius puts c.area puts c.perimeter ... Notice how the access is very "uniform"
  665. Copyright (C) 2008, http://www.dabeaz.com 5- Attributes 39 • An attribute

    is part of the "public" interface of an object that's presented to a user • It has nothing to do with the internal state stored by an object (instance data) • Example : Certain attributes are stored (the radius), but others are computed (the area) • This concept of hiding internals behind methods is very big in OO-programming
  666. Copyright (C) 2008, http://www.dabeaz.com 5- Example : C++ 40 From

    "Effective C++", by S. Meyers Access methods Instance data Motivation
  667. Copyright (C) 2008, http://www.dabeaz.com 5- Class Variables 41 • Classes

    are also "objects" in Ruby • This is subtle, but a class can have its own internal variables (like instance data) class Shape @@ncreated = 0 # Class variable def initialize @x = 0 @y = 0 @@ncreated += 1 end end • A class variable is shared by all instances (but there is just one copy of the variable)
  668. Copyright (C) 2008, http://www.dabeaz.com 5- Class Variables 42 • Class

    variables are not part of instances irb(main):002:0> s = Shape.new => #<Shape:0x625b0 @y=0, @x=0> irb(main):003:0> • Nor are they readable... (They're private too) You do not see a reference to @@ncreated here irb(main):002:0> Shape.ncreated NoMethodError: undefined method 'ncreated' for Shape:Class irb(main):003:0>
  669. Copyright (C) 2008, http://www.dabeaz.com 5- Class Methods 43 • Methods

    can be defined for the class itself class Shape @@ncreated = 0 ... def Shape.ncreated @@ncreated end end Prefix with the class name to define a class method • To use the method, apply it to the class irb(main):006:0> s = Shape.new => #<Shape:0x58cf4 @x=0, @y=0> irb(main):007:0> Shape.ncreated => 1 irb(main):008:0>
  670. Copyright (C) 2008, http://www.dabeaz.com 5- Class Methods 44 • Class

    methods only operate on classes, not instances of objects defined by a class • This is somewhat subtle irb(main):006:0> s = Shape.new => #<Shape:0x58cf4 @x=0, @y=0> irb(main):007:0> Shape.ncreated => 1 irb(main):008:0> s.ncreated NoMethodError: undefined method `ncreated' for #<Shape:0x58cf4 @x=0, @y=0> from (irb):8 irb(main):009:0>
  671. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary 45 • Class methods

    are a "deep concept" • When you define a class, you're actually defining two different kinds of objects • Instances of the class • The class itself • Although they're related, these objects are distinct from each other and handled separately (more in a minute)
  672. Copyright (C) 2008, http://www.dabeaz.com 5- Class Extension 46 • Classes

    are "open" in Ruby • After a class has been defined, you can later open it and add new methods to it • Repeated use of class merely extends the previous definition with new methods
  673. Copyright (C) 2008, http://www.dabeaz.com 5- Class Extension 47 • Example:

    Here's a class class Circle <Shape def initialize(radius) @radius = radius end ... end • And some code that extends it c = Circle.new(4) # Create a circle class Circle # Add a new method def holler puts "I'm a happy shiny circle" end end c.holler # Print 'I'm a happy ...'
  674. Copyright (C) 2008, http://www.dabeaz.com 5- Class Extension 48 • Changes

    affect instances already created! c = Circle.new(4) d = Circle.new(5) puts c.area # Prints 50.2654824574367 puts d.area # Prints 78.5398163397448 class Circle def area (4/(5/4.0))*@radius**2 end end puts c.area # Prints 51.2 puts d.area # Prints 80.0
  675. Copyright (C) 2008, http://www.dabeaz.com 5- Anonymous Classes 49 • Defines

    new methods for a single instance c = Circle.new(4) d = Circle.new(4) # Circle c is moving to Indiana. Fix it class <<c def area 4/(5/4.0)*@radius**2 end end puts c.area # Prints 51.2 puts d.area # Prints 50.2654824574367
  676. Copyright (C) 2008, http://www.dabeaz.com 5- Modules 50 • Ruby provides

    a "module" mechanism module MoveLRUD # Methods for moving def left(dx) # left,right,up,down move(-dx,0) end def right(dx) move(dx,0) end def up(dy) move(0,-dy) end def down(dy) move(0,dy) end end • A module is a namespace • It can contain instance methods/class methods
  677. Copyright (C) 2008, http://www.dabeaz.com 5- Modules 51 • Modules are

    a collection of methods, but they do not define any kind of class or instance • Which makes them rather odd creatures irb(main):010:0> Movable.left(3) NoMethodError: undefined method `left' for Movable:Module from (irb):10 irb(main):011:0> • If you define instance methods in a module, there's no obviously apparent way to use them
  678. Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 52 • A module

    can be included into a class class Shape include MoveLRUD # Include as a mixin def move(dx,dy) @x += dx @y += dy end end • This takes all of the methods in the module and makes them part of the class as if they were defined there • It "mixes in" the other methods
  679. Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 53 • A module

    can also be mixed into an instance a = Shape.new b = Shape.new b.extend(MoveLRUD) b.left(4) b.up(3) a.left(4) # Error. Left not defined • Here, the module methods only work on the specific instance that was extended
  680. Copyright (C) 2008, http://www.dabeaz.com 5- Mixin Commentary 54 • With

    mixins, you implement common functionality in one place (a module) • You then include it in a variety of different places over and over again to reuse it • Note: This is a slightly different concept than "inheritance"
  681. Copyright (C) 2008, http://www.dabeaz.com 5- Access Control 55 • Methods

    are normally public, meaning anyone can call them • Can also have protected and private methods class Foo private def bar ... end protected def spam ... end end
  682. Copyright (C) 2008, http://www.dabeaz.com 5- Access Control 56 • Protected

    • Method can be called by any method of the defining class or subclasses • May be invoked on other instances • Private • Method can only be called by methods in the same class • And only on the current object
  683. Copyright (C) 2008, http://www.dabeaz.com 5- Object Implementation 57 • All

    objects have the same representation • A set of instance variables • A reference to the class • Some flags (related to internals) flags ivars class
  684. Copyright (C) 2008, http://www.dabeaz.com 5- Instances and Classes 58 flags

    ivars class class Circle <Shape def initialize(radius) @radius = radius end end c = Circle.new(4) d = Circle.new(5) flags ivars class c d { 'radius' => 4, ... } { 'radius' => 5, ... } Circle All instances link back to their class
  685. Copyright (C) 2008, http://www.dabeaz.com 5- Class Implementation 59 • A

    class is an object with additional information • A reference to the superclass • A list of methods flags ivars class super methods • Important : A class is also an object
  686. Copyright (C) 2008, http://www.dabeaz.com 5- Class Implementation class Circle <Shape

    ... def area ... end def perimeter ... end ... end { } Circle flags ivars class super methods { 'area'=> method, 'perimeter' => method, ... } Shape flags ivars class super methods class Shape @@ncreated = 0 def move ... end end { 'ncreated => 0 } { 'move' => method, ... } Object 60
  687. Copyright (C) 2008, http://www.dabeaz.com 5- Method Dispatch 61 • Instances

    are linked to classes • Classes are linked to the superclass • This is the key to knowing how methods get dispatched to the appropriate definition • Essentially you just follow those links
  688. Copyright (C) 2008, http://www.dabeaz.com 5- Method Dispatch Circle flags ivars

    class super methods { 'area'=> method, 'perimeter' => method, } Shape flags ivars class super methods { 'move' => method } Object c = Circle.new(4) c.area c.move Every method call involves a search of the class and all base classes (just follow super) 62
  689. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary 63 • The entire

    object system is just a big tree flags ivars class flags ivars class Instances flags ivars class super methods flags ivars class super methods Classes
  690. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary 64 • Everything that

    happens with objects is ultimately related to that tree structure • Inserting nodes into the tree • Creating various links between nodes
  691. Copyright (C) 2008, http://www.dabeaz.com 5- Extending an Instance 65 flags

    ivars class flags ivars class super methods c = Circle.new(4) Circle c { 'area'=> meth, 'perimeter' => } Start by creating a new instance
  692. Copyright (C) 2008, http://www.dabeaz.com 5- Extending an Instance 66 flags

    ivars class flags ivars class super methods Circle c { 'area'=> meth, 'perimeter' => } Now, let's extend that instance by redefining area c = Circle.new(4) class <<c def area 4/(5/4.0)*@radius**2 end end The new area method needs to be inserted here
  693. Copyright (C) 2008, http://www.dabeaz.com 5- Extending an Instance 67 flags

    ivars class flags ivars class super methods c = Circle.new(4) class <<c def area 4/(5/4.0)*@radius**2 end end flags ivars class super methods Circle <virtual> c { 'area'=> meth } { 'area'=> meth, 'perimeter' => } A "virtual" anonymous class gets inserted into the class chain for c V
  694. Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 68 module Foo def

    bar puts 'Foo.bar' end end First define a module A Module is just a collection of methods flags ivars class super methods Foo (Module) { 'bar'=> meth, }
  695. Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 69 flags ivars class

    super methods module Foo def bar puts 'Foo.bar' end end class Circle <Shape ... end Circle flags ivars class super methods Foo (Module) flags ivars class super methods Shape Now, start defining a class { 'bar'=> meth, }
  696. Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 70 flags ivars class

    super methods module Foo def bar puts 'Foo.bar' end end class Circle <Shape include Foo ... end Circle flags ivars class super methods Foo (Module) flags ivars class super methods Shape Include a module as a mixin { 'bar'=> meth, } The functionality of Foo needs to be added to Circle somehow
  697. Copyright (C) 2008, http://www.dabeaz.com 5- Mixins 71 flags ivars class

    super methods module Foo def bar puts 'Foo.bar' end end class Circle <Shape include Foo ... end Circle flags ivars class super methods Foo (Module) flags ivars class super methods Shape Again, an anonymous class gets inserted into the class chain { 'bar'=> meth, } flags ivars class super methods Mixin Proxy
  698. Copyright (C) 2008, http://www.dabeaz.com 5- Classes are Objects 72 •

    Deep thought : A class is also an object • If so, it must belong to some class! • It does - check it out class Circle ... end puts Circle.class # Prints 'Class' • The output says that "Circle" is a "Class"
  699. Copyright (C) 2008, http://www.dabeaz.com 5- Class Objects 73 flags ivars

    class super methods • Here is a picture Circle flags ivars class super methods Class Circle is a Class • Notice the parallel to instances flags ivars class super methods Circle flags ivars class c = Circle.new(r) c is a Circle
  700. Copyright (C) 2008, http://www.dabeaz.com 5- What's a Class? 74 •

    What is this "Class"? • Well, a "Class" is just another object • Which is, well, also a Class irb(main):001:0> puts Class.class Class => nil irb(main):002:0> • Huh?!?!?!
  701. Copyright (C) 2008, http://www.dabeaz.com 5- Taking the Red Pill 75

    flags ivars class super methods • Just sketch it out... Circle flags ivars class super methods Class Circle is a Class Class is a Class • Clearly, there's something is going on here • Let's take a look at the superclasses...
  702. Copyright (C) 2008, http://www.dabeaz.com 5- Classes are Modules 76 •

    Inspect the superclass irb(main):001:0> puts Class.class Class => nil irb(main):002:0> puts Class.superclass Module => nil • Whoa, a class is a kind of Module • And a Module is a namespace • And in the last lecture we saw how you could implement objects using namespaces
  703. Copyright (C) 2008, http://www.dabeaz.com 5- Classes and Modules 77 flags

    ivars class super methods Circle flags ivars class super methods Class Circle is a Class Class is a Class flags ivars class super methods Module Classes are implemented on top of Modules
  704. Copyright (C) 2008, http://www.dabeaz.com 5- Modules are Objects 78 •

    Inspect the class and superclass irb(main):001:0> puts Module.class Class => nil irb(main):002:0> puts Module.superclass Object => nil • A module is an object like everything else • There is a class (Module) • Module inherits from Object
  705. Copyright (C) 2008, http://www.dabeaz.com 5-79 flags ivars class super methods

    Circle flags ivars class super methods Class flags ivars class super methods Module flags ivars class super methods Object Here, you are seeing inheritance, but it's all about classes. 1. A Circle is a Class 2. A Class is a Module 3. A Module is an Object
  706. Copyright (C) 2008, http://www.dabeaz.com 5- Everything is an Object 80

    • A look at "Object" irb(main):001:0> puts Object.class Class => nil irb(main):002:0> puts Object.superclass nil • An Object is described by a Class • There are no parents to an Object • That's the end of the line...
  707. Copyright (C) 2008, http://www.dabeaz.com 5-81 flags ivars class super methods

    Circle flags ivars class super methods Class flags ivars class super methods Module flags ivars class super methods Object nil Object has no parents. So, it terminates the chain of linked objects
  708. Copyright (C) 2008, http://www.dabeaz.com 5-82 flags ivars class super methods

    Circle flags ivars class super methods Class flags ivars class super methods Module flags ivars class super methods Object flags ivars class super methods Shape Let's add in some other classes nil
  709. Copyright (C) 2008, http://www.dabeaz.com 5- How to make sense of

    it? 83 • Everything ultimately leads to Object • That's because everything is an Object • All objects are described by a class
  710. Copyright (C) 2008, http://www.dabeaz.com 5- How to make sense of

    it? 84 • There are always two different paths • The "instance" path • The "class" path • Instance path : Instance methods • Class path : Class methods • The choice depends on the starting object
  711. Copyright (C) 2008, http://www.dabeaz.com 5-85 flags ivars class super methods

    Circle flags ivars class super methods Object flags ivars class super methods Shape nil The Instance Path # c is a Circle instance c.area Here, you start with an instance of Circle. flags ivars class c
  712. Copyright (C) 2008, http://www.dabeaz.com 5-86 flags ivars class super methods

    Circle flags ivars class super methods Class flags ivars class super methods Module flags ivars class super methods Object nil The Class Path c = Circle.new(4) Here, you start with the class itself (Circle).
  713. Copyright (C) 2008, http://www.dabeaz.com 5- Method Resolution 87 • Notice

    how the search process is highly uniform • In fact, there is a simple algorithm obj.meth 1. Follow the class link of obj 2. Look in the method table 3. If not found, follow the super link 4. Repeat 2-4 until you find the method
  714. Copyright (C) 2008, http://www.dabeaz.com 5- A Final Complexity 88 •

    Class methods • Recall that class methods are methods that operate on classes, not instances class Shape <Object @@ncreated = 0 def Shape.ncreated # A class method @@ncreated end end • These methods live along the class chain
  715. Copyright (C) 2008, http://www.dabeaz.com 5-89 class Shape <Object @@ncreated =

    0 def Shape.ncreated @@ncreated end end flags ivars class super methods Shape flags ivars class super methods Shape' (virtual) flags ivars class super methods Class { 'ncreated'=> meth } flags ivars class super methods Object Class methods live in a separate anonymous class that's inserted into the class chain (sometimes known as a "metaclass")
  716. Copyright (C) 2008, http://www.dabeaz.com 5- Final Comments 90 • Here

    are the key points • Everything is an object • All objects are described by a class • Classes are objects • Everything is linked together in a big tree/graph
  717. Copyright (C) 2008, http://www.dabeaz.com 5- Objects in Python • Python

    has always had "objects", but OOP was never the overriding design philosophy • In fact, user-defined classes were one of the last features added to the language • Recall that one motivation for Ruby was to address a perceived problem with Python OO 92
  718. Copyright (C) 2008, http://www.dabeaz.com 5- Everything is an Object 93

    • Objects in Python are organized in a hierarchy object int • object is at the top • However, the hierarchy is relatively flat • Example: Don't see numbers grouped together under a class "Numeric" float str list dict
  719. Copyright (C) 2008, http://www.dabeaz.com 5- Inspecting Objects 94 • All

    objects have a "type" >>> x = 37 >>> x.__class__ <type 'int'> >>> • The type is the class to which an object belongs • Finding the parent class (superclass) >>> int.__bases__ (<type 'object'>,) >>>
  720. Copyright (C) 2008, http://www.dabeaz.com 5- Inspecting Objects 95 • Get

    a list of things that are defined (dir) >>> x = 37 >>> dir(x) ['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', ... ] >>>
  721. Copyright (C) 2008, http://www.dabeaz.com 5- The class statement • Defines

    a new user-defined object class Circle(object): def __init__(self,radius): self.radius = radius def area(self): return math.pi*(self.radius**2) def perimeter(self): return 2*math.pi*self.radius • A class is a collection of functions (methods) • Nothing conceptually new here 96
  722. Copyright (C) 2008, http://www.dabeaz.com 5- Creating Instances • The class

    serves as a "factory" >>> c = Circle(4.0) >>> c.area() 50.26548245743669 >>> c.radius 4.0 >>> 97 • Note : You don't call a special method like "new", you just use the class as a function
  723. Copyright (C) 2008, http://www.dabeaz.com 5- __init__ method • Used to

    initialize objects • Called whenever a new object is created >>> c = Circle(4.0) class Circle(object): def __init__(self,radius): self.radius = radius newly created object • __init__ is example of a "special method" • Has special meaning to Python interpreter 98
  724. Copyright (C) 2008, http://www.dabeaz.com 5- Instance Variables • Data stored

    within the object class Circle(object): def __init__(self,radius): self.radius = radius • Outside class, just access through the instance name • Inside class, referenced using self.attrname def area(self): return math.pi*(self.radius**2) >>> c = Circle(4.0) >>> c.radius 4.0 99
  725. Copyright (C) 2008, http://www.dabeaz.com 5- Methods • Functions applied to

    instances of an object class Circle(object): ... def area(self): return math.pi*self.radius**2 • By convention, called "self" • The object is always passed as first argument >>> c.area() def area(self): ... The name is unimportant---the object is always passed as the first argument. It is simply Python programming style to call this argument "self." C++ programmers might prefer to call it "this." 100
  726. Copyright (C) 2008, http://www.dabeaz.com 5- Python Differences • You're already

    seeing some huge differences from the Ruby object system • All instance variables are fully visible >>> c = Circle(4) >>> c.radius 4 >>> c.radius = 5 >>> 101 • Explicit use of "self" to refer to the instance def area(self): return math.pi*self.radius**2
  727. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance • It's fully supported

    • List base classes when defining the class class Parent(object): ... class Derived(Parent): ... • Bases specified in () after class name 102
  728. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance Example • Shapes and

    Circles class Shape(object): def __init__(self): self.x = 0 self.y = 0 def move(self,dx,dy): self.x += dx self.y += dy class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 103
  729. Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Classes in

    Python may have multiple bases class Foo(object): ... class Bar(object): ... class Spam(Foo,Bar): ... • We have not seen this before • Not allowed in Ruby, Smalltalk, Java, etc. • There are some nasty issues (later) 104
  730. Copyright (C) 2008, http://www.dabeaz.com 5- Class Variables • Class can

    have variables. Just define in the class definition class Shape(object): numcreated = 0 # class variable def __init__(self): Shape.numcreated += 1 self.x = 0.0 self.y = 0.0 >>> Shape.numcreated 0 >>> s = Shape() >>> Shape.numcreated 1 >>> 105
  731. Copyright (C) 2008, http://www.dabeaz.com 5- Class Methods • Require special

    "decoration" class Shape(object): @classmethod def spam(cls): print "Hello. Your class is", cls class Circle(Shape): ... >>> Circle.spam() Hello. Your class is <class '__main__.Circle'> >>> Shape.spam() Hello. Your class is <class '__main__.Shape'> >>> 106 • Class methods receive the class itself as the first argument (classes are also objects)
  732. Copyright (C) 2008, http://www.dabeaz.com 5- An Oddity • Class methods/variables

    and instance methods/variables are co-mingled class Shape(object): @classmethod def spam(cls): print "Hello. Your class is", cls class Circle(Shape): ... >>> c = Circle(4) >>> c.spam() Hello. Your class is <class '__main__.Circle'> >>> c.numcreated 1 >>> 107 • Class methods can be invoked via instance
  733. Copyright (C) 2008, http://www.dabeaz.com 5- Other Differences • Classes aren't

    quite "open" in the same way as they are in Ruby • If the same class definition appears more than once, the new definition replaces the old definition (but it does not affect existing instances) 108
  734. Copyright (C) 2008, http://www.dabeaz.com 5- Other Differences • Example :

    Class redefinition 109 class Foo(object): def bar(self): print "Hello World" a = Foo() class Foo(object): def bar(self): print "Hello Cruel World" b = Foo() a.bar() # Prints "Hello World" b.bar() # Prints "Hello Cruel World"
  735. Copyright (C) 2008, http://www.dabeaz.com 5- Class Extension • You can

    add new methods to an existing class by just defining them outside and attaching them to the class object 110 class Circle(object): def __init__(self,radius): self.radius = radius def area(c): return math.pi*c.radius**2 Circle.area = area
  736. Copyright (C) 2008, http://www.dabeaz.com 5- That's it (Mostly) • Believe

    it or not, that is about the extent of defining and using classes in Python • A class is just a bunch of functions • The functions are normally an instance method that receives the instance as the first parameter (self) • Can optionally be defined as class methods that receive the class as the first parameter 111
  737. Copyright (C) 2008, http://www.dabeaz.com 5- That's it (Mostly) • There

    is no access control (public, private, protected) • No separate notion of "modules" and mixins 112
  738. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • Most of these

    features are available, but they just take a different form • Example : Mixins using multiple inheritance 113 class Shape(object): def move(self,dx,dy): self.x += dx self.y += dy class MoveLRUD(object): def left(self,dx): self.move(-dx,0) def right(self,dx): self.move(dx,0) def up(self,dy): self.move(0,-dy) def down(self,dy): self.move(0,dy) class Circle(Shape,MoveLRUD): ...
  739. Copyright (C) 2008, http://www.dabeaz.com 5- Interlude • Here is the

    Python view on objects... • An instance is just a collection of stuff • A class is just a collection of stuff • A dictionary is just a collection of stuff • Hey, I'll just use that! 114
  740. Copyright (C) 2008, http://www.dabeaz.com 5- Object Implementation • Python's implementation

    of objects is mainly just a wrapper layer • Objects and classes are just wrappers around dictionaries • And methods are just wrappers around ordinary functions • Let's go take a look... 115
  741. Copyright (C) 2008, http://www.dabeaz.com 5- Recall: Classes • A class

    definition class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius • A class creates a special kind of object >>> Circle <class '__main__.Circle'> >>> • What is this object? 116 A Class Object
  742. Copyright (C) 2008, http://www.dabeaz.com 5- Class Objects • It's a

    wrapper around a dictionary >>> Circle.__dict__ <dictproxy object at 0x54ff0> >>> Circle.__dict__.keys() ['__module__','area','perimeter','__dict__','__weakref__', '__doc__','__init__'] >>> Circle.__dict__['area'] <function area at 0x4d430> >>> Circle.__dict__['perimeter'] <function perimeter at 0x4d4b0> <class Circle> .__dict__ { 'area' : <function>, 'perimeter' : <function>, '__init__' : <function>, } __dict__ class object 117
  743. Copyright (C) 2008, http://www.dabeaz.com 5- Instances • Instance data is

    also stored in a dictionary >>> c = Circle(4) >>> c.__dict__ {'x' : 0,'radius' : 4, 'y' : 0 } >>> • Dictionary holds attributes class Circle(Shape): def __init__(self,radius) Shape.__init__(self) self.radius = shares Instance c of Circle .__dict__ { 'x' : 0, 'radius' : 4, 'y' : 0 } __dict__ instance 118
  744. Copyright (C) 2008, http://www.dabeaz.com 5- Putting it Together • We

    just need to connect the dots • Each instance has a dictionary for its data • Each class has a dictionary for its methods 119
  745. Copyright (C) 2008, http://www.dabeaz.com 5- Instances to Classes • Instances

    hold a reference to their class >>> c = Circle(4) >>> c.__dict__ {'x': 0,'radius': 4,'y': 0 } >>> c.__class__ <class '__main__.Circle'> >>> • __class__ attribute refers to class object 120
  746. Copyright (C) 2008, http://www.dabeaz.com 5- Classes to Superclasses • Example:

    class A(B,C): ... • Classes may inherit from other classes • Bases stored as a tuple in class object >>> A.__bases__ (<class '__main__.B'>,<class '__main__.C'>) >>> • __bases__ is tuple of base class objects 121
  747. Copyright (C) 2008, http://www.dabeaz.com 5- Object Representation .__dict__ {attrs} .__class__

    .__dict__ {attrs} .__class__ .__dict__ {attrs} .__class__ .__dict__ {methods} .__bases__ (base1,base2,...) instances class .__dict__ {methods} .__bases__ .__dict__ {methods} .__bases__ 122 base classes
  748. Copyright (C) 2008, http://www.dabeaz.com 5- Object Representation • Key point:

    Everything stored in dictionaries • Instances have dicts (instance data) • Classes have dicts (methods, class attributes) • Instances, classes, bases are linked together (__class__, __bases__) 123
  749. Copyright (C) 2008, http://www.dabeaz.com 5- Attribute Lookup • Python has

    special operators for getting, setting, and deleting "attributes" 124 obj.name # Get an attribute value obj.name = value # Set an attribute value del obj.name # Delete an attribute • To finish off the object system, you just have to define the behavior of these operators • Connect it up to all of those dictionaries
  750. Copyright (C) 2008, http://www.dabeaz.com 5- Setting Attributes • Setting an

    attribute just updates the local dictionary of object >>> c = Circle(4) >>> c.__dict__ {'x': 0, 'radius': 4,'y' : 0 } >>> c.radius = 5 >>> c.color = "Blue" >>> c.__dict__ { 'x': 0, 'radius': 5, 'y': 0, 'color':'Blue' } >>> • Deleting an attribute just removes it from the dictionary 125
  751. Copyright (C) 2008, http://www.dabeaz.com 5- Setting Attributes • Setting an

    attribute overrides any attributes set in class or bases >>> c = Circle(4) >>> c.area() 50.2654824574367 >>> c.area = "pretty big" >>> c.area() Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: 'str' object is not callable >>> c.area 'pretty big' >>> • One way to create a mighty kerfuffle 126
  752. Copyright (C) 2008, http://www.dabeaz.com 5- Reading Attributes • A more

    complicated problem • Attribute may be supplied from many places • Local dictionary • Class object • Base classes (inheritance) 127
  753. Copyright (C) 2008, http://www.dabeaz.com 5- Reading Attributes • First check

    in local __dict__ • If not found, look in __dict__ of class >>> c = Circle(...) >>> c.radius 4 >>> c.area() 50.2654824574 >>> c .__dict__ .__class__ {'x' : 0, 'radius' : 4, ...} Circle .__dict__ {'area': <func>, 'perimeter':<func>, '__init__':..} • If not found in class, look in base classes .__bases__ look in __bases__ 1 2 3 128
  754. Copyright (C) 2008, http://www.dabeaz.com 5- A Subtle Issue • Python

    uses a single dictionary to store everything associated with a class • This dictionary contains both data (class variables) and methods • So, you can't have instance data and methods with the same names (they'll conflict) • However, there is a tricky bit with all of this... 129
  755. Copyright (C) 2008, http://www.dabeaz.com 5- Method Lookup • If you

    lookup data, you get the data • If you lookup a method, it's different • You don't get method function! 130 >>> c = Circle(4) >>> c.radius 42 >>> c.area <bound method Circle.area of <__main__.Circle object at 0x6cb50>> >>> • What in the heck is that?
  756. Copyright (C) 2008, http://www.dabeaz.com 5- Bound Methods • Methods always

    get "wrapped" • The returned object is a method that's waiting for you to call it... 131 >>> c = Circle(4) >>> a = c.area >>> a <bound method Circle.area of <__main__.Circle object at 0x6cb50>> >>> a() 50.2654824574 >>> Calls the method The method itself as an object
  757. Copyright (C) 2008, http://www.dabeaz.com 5- Bound Methods • Normally you

    don't see it, but method calls are always a two-step process like this 132 c.area() <bound method : area> . operator - attribute lookup () operator - call 50.2654824574 • Essentially, looking up a method is separate from calling the method
  758. Copyright (C) 2008, http://www.dabeaz.com 5- Bound Methods • Underneath the

    covers 133 >>> c = Circle(4) >>> a = c.area >>> a <bound method Circle.area of <__main__.Circle object at 0x6cb50>> >>> a.im_class <class '__main__.Circle'> >>> a.im_func <function area at 0x69370> >>> a.im_self <__main__.Circle object at 0x6cb50> >>> • What happens on call? >>> a.im_func(a.im_self) 50.2654824574 >>> The class The func Instance
  759. Copyright (C) 2008, http://www.dabeaz.com 5- Wrapping Concerns • The fact

    that certain items pop out of a class with a wrapper slapped onto it should be somewhat disturbing • Who or what is doing this wrapping? • How does it fit into the rest of the object system? 134
  760. Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • Class attribute wrapping

    is performed by defining "descriptor" objects • A descriptor is an object that hooks into the attribute access on classes in Python • Allows customized actions to be defined 135
  761. Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • A Sample Descriptor

    Object class Descriptor(object): def __get__(self,instance,cls): print "get", instance,cls def __set__(self,instance,value): print "set", instance, value def __delete__(self,instance) print "delete", instance 136 • Placing it into a class definition class Foo(object): bar = Descriptor() ... • It just has to have __get__,__set__, etc.
  762. Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • How it works

    >>> f = Foo() >>> f.bar get <__main__.Foo object at 0x5a810> <class '__main__.Foo'> >>> f.bar = 4 set <__main__.Foo object at 0x5a810> 4 >>> del f.bar delete <__main__.Foo object at 0x5a810> >>> • Attribute access automatically invokes __get__(), __set__(), and __delete__() 137
  763. Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • Descriptors are used

    for method wrapping class BoundMethod(object): def __init__(self,func,cls,instance): self.im_func = func self.im_class = cls self.im_self = instance def __call__(self,*args,**kwargs): return self.im_func(self.im_self, *args,**kwargs) 138 class InstanceMethodDescriptor(object): def __init__(self,func): self.func = func def __get__(self,instance,cls): return BoundMethod(self.func,cls,instance)
  764. Copyright (C) 2008, http://www.dabeaz.com 5- Descriptors • Example of how

    put together def bar_impl(self): print "I'm an instance method bar" class Foo(object): bar = InstanceMethodDescriptor(bar_impl) 139 • Example use: >>> f = Foo() >>> f.bar <__main__.BoundMethod object at 0x6cbb0> >>> f.bar() I'm an instance method bar >>>
  765. Copyright (C) 2008, http://www.dabeaz.com 5- Descriptor Commentary • If this

    makes sense, you're at 11 with Python • Much of this tucked away behind the scenes • It's critical to how Python works • But unknown to most Python programmers 140
  766. Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance class A(object): pass

    class B(object): pass class C(A,B): pass • Base tuple contains multiple entries object A B C • For example: >>> C.__bases__ (<class '__main__.A'>, <class '__main__.B'>) >>> 141
  767. Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Attribute lookup

    looks in base classes • However, complex hierarchies make this much more tricky class A(object): def bar(self): pass def spam(self): pass class B(object): def spam(self): pass class C(A,B): pass >>> c = C() >>> c.spam() # Which spam()??? 142
  768. Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Lookup rules

    • Class is always checked first • Then bases are checked in order listed class A(object): ... class B(object): ... class C(A,B): ... >>> c = C() >>> c.spam() 143 Search order: C, A, B
  769. Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance class A(object): pass

    class B(object): pass class C(A,B): pass class D(B): pass class E(C,D): pass • Consider a more complex hierarchy object A B C D E • What happens here? >>> e = E() >>> e.x # Attribute access 144
  770. Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Search order

    is based on a sort of bases object A B C D E • Search rules >>> e = E() >>> e.x • Can all of these be satisfied? Check E first C before D : class E(C,D) C before A : class C(A,B) C before B : class C(A,B) D before B : class D(B) A before B : class C(A,B) object last • Answer: Yes. E,C, A, D, B, object 145
  771. Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Method resolution

    order (MRO) • __mro__ attribute contains order in which classes are searched >>> E.__mro__ (<class '__main__.E'>, <class '__main__.C'>, <class '__main__.A'>, <class '__main__.D'>, <class '__main__.B'>, <type 'object'>) • Determination of MRO is rather complex • Beyond scope of this talk • "C3 Linearization Algorithm" 146
  772. Copyright (C) 2008, http://www.dabeaz.com 5- Multiple Inheritance • Can define

    classes that are rejected! • Example: class A(object): pass class B(object): pass class C(A,B): pass class D(B,C): pass Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: Error when calling the metaclass bases Cannot create a consistent method resolution order (MRO) for bases B, C object A B C D • Reason: class D(B,C) --> B before C class C(A,B) --> C before B (B is base of C) 147
  773. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • Objects in Python

    are much more exposed • No notion of private data • Implementation is completely visible • Again, it's really just a layer that's been wrapped around dictionaries • However, over time, various "tweaks" have shown up in the language 148
  774. Copyright (C) 2008, http://www.dabeaz.com 5- Private Attributes • Any attribute

    with leading __ is "private" class Foo(object): def __init__(self): self.__x = 0 • Example >>> f = Foo() >>> f.__x AttributeError: 'Foo' object has no attribute '__x' >>> • This is really just a name mangling trick >>> f = Foo() >>> f._Foo__x 0 >>> 149
  775. Copyright (C) 2008, http://www.dabeaz.com 5- __slots__ Attribute • You can

    restrict the set of attribute names class Foo(object): __slots__ = ['x','y'] ... • Produces errors for other attributes >>> f = Foo() >>> f.x = 3 >>> f.y = 20 >>> f.z = 1 Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'Foo' object has no attribute 'z' • Prevents errors, restricts usage of objects 150
  776. Copyright (C) 2008, http://www.dabeaz.com 5- Properties • Consider a class

    with some accessor funcs class Foo(object): def __init__(self,name): self.__name = name def getName(self): return self.__name def setName(self,name): if not isinstance(name,str): raise TypeError, "Expected a string" self.__name = name • Property maps accessor funcs to attribute class Foo(object): ... name = property(getName,setName) ... 151
  777. Copyright (C) 2008, http://www.dabeaz.com 5- Properties • Example: >>> f

    = Foo("Elwood") >>> f.getName() 'Elwood' >>> f.name = 'Jake' >>> f.getName() 'Jake' >>> f.name = 45 TypeError: Expected a string >>> 152 Notice attribute assignment is caught • Properties would be the closest equivalent to how Ruby deals with attributes (via methods)
  778. Copyright (C) 2008, http://www.dabeaz.com 5- Attribute Access • User defined

    classes may redefine attribute access entirely • In other words, you can redefine (.) • Set of special methods for setting, deleting, and getting attributes 153
  779. Copyright (C) 2008, http://www.dabeaz.com 5- __getattribute__() • __getattribute__(self,name) • Called

    every time an attribute is read • Default behavior looks at instance dict • Then it checks the class dict • Then it checks base classes (inheritance) • If that fails, __getattr__(self,name) method is invoked 154
  780. Copyright (C) 2008, http://www.dabeaz.com 5- __getattr__() method • __getattr__(self,name) •

    A failsafe method. Called if an attribute can't be found using the standard mechanism • Default behavior is to raise AttributeError 155
  781. Copyright (C) 2008, http://www.dabeaz.com 5- __setattr__() method • __setattr__(self,name,value) •

    Called every time an attribute is set • Default behavior is to store value in local dictionary of self 156
  782. Copyright (C) 2008, http://www.dabeaz.com 5- Example: class Circle(Shape): def __init__(self,radius):

    self.radius = radius def __getattr__(self,name): if name == 'area': return math.pi*self.radius**2 elif name == 'perimeter': return 2*math.pi*self.radius else: return Shape.__getattr__(self,name) >>> c = Circle(4) >>> c.radius 4 >>> c.area 50.2654824574 >>> c.perimeter 25.132741228718345 >>> 157
  783. Copyright (C) 2008, http://www.dabeaz.com 5- Classes as Objects • Like

    Ruby, Python also has the concept of classes as objects • However, there is a huge twist to it • Python let's you redefine what a class is! • Let's go take a look... 158
  784. Copyright (C) 2008, http://www.dabeaz.com 5- Overview • When you define

    a class, you get an "object" 159 class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius >>> Circle <class '__main__.Circle'> • A "class object"
  785. Copyright (C) 2008, http://www.dabeaz.com 5- Classes as Objects • Classes

    are instances of "types" >>> class Circle(Shape): pass >>> type(Circle) <type 'type'> >>> isinstance(Circle,type) True >>> Recall: type() tells you the type of an object. Here we're using it on a class itself. 160 • Here, Python is following the convention that you see in C++/Java • Classes define types
  786. Copyright (C) 2008, http://www.dabeaz.com 5- Creating Types • So, class

    definitions create new types. • However, a type is just a class class type(object): def __init__(self, *args, **kwargs): ... >>> type <type 'type'> >>> 161 • It's a class that creates new "types" • This is something known as a "metaclass"
  787. Copyright (C) 2008, http://www.dabeaz.com 5- What is a class? •

    Consider a class: 162 • What are its components? • Name ("Circle") • Base classes (Shape) • Functions (__init__,area, perimeter) class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius
  788. Copyright (C) 2008, http://www.dabeaz.com 5- Creating a Class • You

    can create a class without using the class statement (just assemble the pieces) 163 def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius >>> methods = { ... '__init__' : __init__, ... 'area' : area, ... 'perimeter' : perimeter } ... >>> Circle = type("Circle",(Shape,),methods) >>> Circle <class '__main__.Circle'> >>>
  789. Copyright (C) 2008, http://www.dabeaz.com 5- Class Definition • What happens

    during class definition? • Step1: Body of class is extracted (into a string) body = """ def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius """ 164 class Circle(Shape): def __init__(self,radius): Shape.__init__(self) self.radius = radius def area(self): return math.pi*self.radius**2 def perimeter(self): return 2*math.pi*self.radius
  790. Copyright (C) 2008, http://www.dabeaz.com 5- Class Definition • Step 2:

    Body is exec'd in its own dictionary __dict__ = { } exec body in globals(), __dict__ • The statements in the body execute • Afterwards, __dict__ is populated >>> __dict__ {'__init__' : <function __init__ at 0x4da10>, 'area' : <function area at 0x4dd70>, 'perimeter': <function perimeter at 0x4dea0>,} >>> 165
  791. Copyright (C) 2008, http://www.dabeaz.com 5- Class Definition • Step 3:

    Class is constructed from its name, base classes, and the dictionary >>> Circle = type("Circle",(Shape,),__dict__) >>> Circle <class '__main__.Circle'> >>> c = Circle(4) >>> c.area() 50.2654824574 >>> • type(name, bases, dict) constructs a class object 166
  792. Copyright (C) 2008, http://www.dabeaz.com 5- The Metaclass Hook • Python

    provides a hook that allows you to intercept the class creation step • Using this, you can feed the "class" into something other than "type" • In other words, you could come up with something very different than a normal class 167
  793. Copyright (C) 2008, http://www.dabeaz.com 5- Metaclass Selection • __metaclass__ attribute

    • Sets the metaclass that's used for construction • May be a class attribute or a global variable class Foo: __metaclass__ = type def bar(self): print "Foo.bar" 168 __metaclass__ = type class Foo: ... class Bar: ...
  794. Copyright (C) 2008, http://www.dabeaz.com 5- New Metaclasses • By changing

    the metaclass hook, you can create your own magic types • For example, inherit from type and tweak it 169
  795. Copyright (C) 2008, http://www.dabeaz.com 5- Creating a Metaclass • Usually,

    you inherit from type and redefine __new__ class mytype(type): def __new__(cls,name,bases,__dict__): print "Creating class : ", name print "Base classes : ", bases print "Attributes : ", __dict__.keys() return type.__new__(cls,name,bases,__dict__) 170 • Then you define objects that hook to it class myobject: __metaclass__ = mytype
  796. Copyright (C) 2008, http://www.dabeaz.com 5- Metaclass Applications • __new__ method

    provides class name, base classes, and dictionary prior to class creation • Can inspect this information • Can modify this information • If you know what you are doing, can be used for a variety of useful/diabolical purposes 171
  797. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • Metaclasses are probably

    the most advanced and misunderstood part of Python • However, used widely by framework developers • Can be used to perform very interesting things with objects • Where do you go after reaching level 11? Metaclasses. 172
  798. Copyright (C) 2008, http://www.dabeaz.com 5- Interlude • The Python implementation

    of objects is based on a simple idea (just use dictionaries) • However, there are some subtle complications of that approach. • The co-mingling of data and functions means that you have to play some games with wrappers (descriptors) to get it to work • Otherwise, it's similar to what we saw before 173
  799. Copyright (C) 2008, http://www.dabeaz.com 5- Interlude • The Python approach

    has influenced others • Ruby : No way! • Perl : Let's do that! • Javascript : Objects, dictionaries, what's the difference? 174
  800. Copyright (C) 2008, http://www.dabeaz.com 5- Perl Objects • Perl has

    support for OO programming • It is generally acknowledged that the whole idea for it was taken straight out of Python. • Guido (Python) and Larry Wall (Perl) had previously interaction at conferences • And Perl already had a dictionary type (Hash) 176
  801. Copyright (C) 2008, http://www.dabeaz.com 5- Instance Variables • An object

    has to store instance variables • Let's just put them in a hash sub new { my $radius = shift; my $instance_data = { "radius" => $radius }; return $instance_data; } 177 • Hey, Python used a dictionary after all...
  802. Copyright (C) 2008, http://www.dabeaz.com 5- Methods • Let's write functions

    that use the hash sub area { my $self = shift; return $PI*$self{'radius'}**2; } sub perimeter { my $self = shift; return 2*$PI*$self{'radius'}; } 178 • These are just normal functions my $c = new(4); print(area($c),"\n");
  803. Copyright (C) 2008, http://www.dabeaz.com 5- Packages • Perl has packages

    which are a namespace package Circle; sub new { my $radius = shift; my $instance_data = { "radius" => $radius }; return $instance_data; } sub area { my $self = shift; return $PI*$self{'radius'}**2; } sub perimeter { my $self = shift; return 2*$PI*$self{'radius'}; } 179
  804. Copyright (C) 2008, http://www.dabeaz.com 5- Packages • With namespaces, we're

    real close... $c = Circle::new(4); print(Circle::area($c),"\n"); ... 180 • All of the methods are packaged together • Similar to a class • Recall from last lecture : This was one way that classes came about
  805. Copyright (C) 2008, http://www.dabeaz.com 5- Blessing Things • Perl can

    "bless" data into a package 181 package Circle; sub new { my $radius = shift; my $instance_data = { "radius" => $radius }; bless $instance_data,"Circle"; return $instance_data; } • This sets an attribute on the hash to point to the package name supplied • Aha! So that's a link to the class (the package)
  806. Copyright (C) 2008, http://www.dabeaz.com 5- Blessing Things • Using the

    "blessed" object 182 $c = Circle::new(4); print($c->area(),"\n"); print($c->perimeter(),"\n"); • This gives us the -> syntax for methods
  807. Copyright (C) 2008, http://www.dabeaz.com 5- Patching up Constructors • A

    more OO-syntax: 183 $c = Circle->new(4); print($c->area(),"\n"); print($c->perimeter(),"\n"); • Requires a slight change to the new function sub new { my ($pkg,$radius) = @_; # Get name and argument my $instance_data = { "radius" => $radius }; bless $instance_data, $pkg; return $instance_data; }
  808. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • We now have

    basic "objects" • Hash tables tied to a package with methods • Some convenient syntax (->, ->new) • Next : Inheritance 184
  809. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance • Just have a

    special variable with base classes 185 package Shape; sub new { my $pkg = shift; my $instance = {"x" = > 0, "y" => 0 }; bless $instance,$pkg; return $instance; } sub move { my ($self,$dx,$dy) = @_; $self->{'x'} += $dx; $self->{'y'} += $dy; } package Circle; @ISA = ("Shape"); # Inherit from Shape
  810. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance • Initializing the base

    186 package Circle; @ISA = ("Shape"); # Inherit from Shape sub new { my ($pkg,$radius) = @_; my $instance = $pkg->SUPER::new(); $instance->{'radius'} = $radius; return $instance; } ...
  811. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • There are a

    few more details • Basically, you're just linking hash tables and packages together • Hash table is the instance data • Package is the class • Variables in the package set up inheritance 187
  812. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary • The Perl object

    system is actually more flexible than you might imagine • For example, you don't technically have to use a Hash object to store data • You can implement objects in different ways • Many other customization features 188
  813. Copyright (C) 2008, http://www.dabeaz.com 5- Javascript Objects • Javascript doesn't

    really have an OO system based on classes per se. • Instead, it just merges arrays and objects together • Essentially : An associative array is an object. An object is an associative array. 190
  814. Copyright (C) 2008, http://www.dabeaz.com 5- Creating an "Object" • Here's

    some instance data 191 var c = { 'radius' : 4, 'x' : 0, 'y' : 0 }; • Once you've done that, you can access the data in two different ways document.writeln(c['radius']); document.writeln(c.radius); • The (.) operator is just an array lookup
  815. Copyright (C) 2008, http://www.dabeaz.com 5- Define a Method 192 var

    c = { 'radius' : 4, 'x' : 0, 'y' : 0 }; c.area = function() { return 3.1415926*this.radius*this.radius; } • Just attach a function to an array • Call the method a = c.area(); • Inside the function, 'this' refers to the array
  816. Copyright (C) 2008, http://www.dabeaz.com 5- Constructor Functions 193 function Circle(radius)

    { this.radius = radius; } c = new Circle(4); • Any function can be a "constructor" • If you call a function like this, 'this' is already set up to point to an empty array • You just place values into it
  817. Copyright (C) 2008, http://www.dabeaz.com 5- A Simple Object 194 function

    Circle(radius) { this.radius = radius; this.area = function() { return PI*this.radius*this.radius; } this.perimeter = function() { return 2*PI*this.radius; } } c = new Circle(4); a = c.area(); p = c.perimeter(); • Just write a function and define methods
  818. Copyright (C) 2008, http://www.dabeaz.com 5- A Problem 195 • One

    problem with this approach • Methods get defined and stored in every single instance function Circle(radius) { this.radius = radius; this.area = function() { return PI*this.radius*this.radius; } this.perimeter = function() { return 2*PI*this.radius; } } • Needless to say, that isn't very efficient
  819. Copyright (C) 2008, http://www.dabeaz.com 5- Prototype Objects 196 function Circle(radius)

    { this.radius = radius; } Circle.prototype.area = function() { return PI*this.radius*this.radius; } Circle.prototype.perimeter = function() { return 2*PI*this.radius; } • A function has a hidden "prototype" attached • A prototype is just another object (array)
  820. Copyright (C) 2008, http://www.dabeaz.com 5- Prototype Objects 197 c =

    new Circle(4); r = c.radius; # From array c a = c.area(); # From Circle.prototype • If a function has a prototype attached, a link to the prototype gets carried along with any object that gets created • Attribute lookup will go to the prototype if it can't be found in the array itself { 'radius' : 4 } { 'area' : <func> 'perimeter' : <func> } Circle.prototype
  821. Copyright (C) 2008, http://www.dabeaz.com 5- Classes and Prototypes 198 •

    Prototypes look a lot like a class • Every object has its own data • But, each object is also linked to a prototype that can supply values as a fallback
  822. Copyright (C) 2008, http://www.dabeaz.com 5- Inheritance 199 • Since Javascript

    doesn't really have classes, there is no "class-based" inheritance • However, you can play funny games with linking prototypes together • This gets rather ugly in a hurry
  823. Copyright (C) 2008, http://www.dabeaz.com 5- Prototype Inheritance 200 function Shape()

    { this.x = 0; this.y = 0; } Shape.prototype.move = function(dx,dy) { this.x += dx; this.y += dy; } function Circle(radius) { Shape.call(this); this.radius = radius; } Circle.prototype = new Shape(); delete Circle.prototype.x; delete Circle.prototype.y; Circle.prototype.constructor = Circle; Circle.prototype.area = function() { return PI*this.radius*this.radius; } We create a Shape and use it as the prototype. However, we have to patch it up a bit.
  824. Copyright (C) 2008, http://www.dabeaz.com 5- Borrowing Methods 201 function Shape()

    { this.x = 0; this.y = 0; } Shape.prototype.move = function(dx,dy) { this.x += dx; this.y += dy; } function Circle(radius) { Shape.call(this); this.radius = radius; } for (m in Shape.prototype) { if (typeof Shape.prototype[m] != "function") continue; Circle.prototype[m] = Shape.prototype[m]; } Circle.prototype.area = function() { return PI*this.radius*this.radius; } Here we're just copying functions from one prototype to another
  825. Copyright (C) 2008, http://www.dabeaz.com 5- Commentary 202 • Javascript is

    probably the logical extension of using hash tables/arrays to represent objects • It essentially just merges them together • Functions are set up to receive the array as "this" if they're invoked through an array
  826. Copyright (C) 2008, http://www.dabeaz.com 5- Big Picture 204 • We

    have taken a very detailed tour of how objects work in a variety of languages • There were some common themes • Covered many subtle differences between implementations
  827. Copyright (C) 2008, http://www.dabeaz.com 5- Wrap-up 205 • Your brain

    probably hurts by now • Next time, we'll look at some common design patterns related to use objects • Will shift gears into some other topics
  828. Copyright (C) 2008, http://www.dabeaz.com 6- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  829. Copyright (C) 2008, http://www.dabeaz.com 6- Overview 3 • Files, File

    systems, and I/O • Processes and subprocesses • Text parsing and pattern matching • Inside regular expressions
  830. Copyright (C) 2008, http://www.dabeaz.com 6- Big Applications 5 • Let's

    be honest, most "serious" computing applications tend to be written in C, C++, Java, or some kind of "compiled" language • It's partly for performance (a C program may be 100x faster than an equivalent script) • Also for extra safety. The compiler has strict rules and performs all kinds of program checking (to catch errors before you run)
  831. Copyright (C) 2008, http://www.dabeaz.com 6- Reading/Writing Data 6 • Large

    applications don't exist in total isolation • They always have to read/write data • The data may arrive in many ways (files, pipes, network, etc.) • And many possible formats
  832. Copyright (C) 2008, http://www.dabeaz.com 6- Reality 7 • Real programmers

    don't use one application for everything (well, let's exclude emacs) • You solve problems by using many different applications for different kinds of tasks • You use the best tool for the job • Much of day-to-day work is involved in simply moving data around between applications
  833. Copyright (C) 2008, http://www.dabeaz.com 6- Example 8 Modeling Application Files

    Imaging Images Analysis Database WWW Files Web pages Params Data Acquisition
  834. Copyright (C) 2008, http://www.dabeaz.com 6- Example 9 • In that

    picture, each component is a completely separate application • May be written in different languages • Developed completely independently • May be legacy code that can't be replaced
  835. Copyright (C) 2008, http://www.dabeaz.com 6- Personal Experience 10 • When

    I started working with dynamic languages in 1995, I was a programmer on a large scientific computing project • About 80% of our time was spent futzing around with data files (moving them around, converting them, making them work with other programs, etc.) • It was a huge pain.
  836. Copyright (C) 2008, http://www.dabeaz.com 6- Dynamic Languages 11 Modeling Application

    Files Imaging Analysis Database Files Params Data Acquisition Python Script Analysis
  837. Copyright (C) 2008, http://www.dabeaz.com 6- Why Dynamic Languages? 12 •

    Very easy to develop and reconfigure • Can handle a huge variety of data formats • You're taking a problem which is inherently messy and solving with a language that is adept at solving messy problems.
  838. Copyright (C) 2008, http://www.dabeaz.com 6- Files 14 • Files are

    probably the most basic form of handling data • Programs write files as output • You create files that serve as input • Let's talk about some basic concepts...
  839. Copyright (C) 2008, http://www.dabeaz.com 6- File Implementation 15 • At

    the lowest level, a file is a byte sequence • All operations concerning files are focused around manipulating that byte sequence (reading it, writing it, modifying it, etc.) • To the operating system, there is nothing particularly "special" about any given file • It's just a bunch of bytes...
  840. Copyright (C) 2008, http://www.dabeaz.com 6- Opening a File 16 •

    To use a file, it must first be "opened" • Example: f = open("somefile.txt","r") # Open for read f = open("somefile.txt","w") # Open for write f = open("somefile.txt","a") # Open for append • This gives you an object with basic operations f.read(maxbytes) # Read N bytes f.write(text) # Write to a file f.close() # Close the file
  841. Copyright (C) 2008, http://www.dabeaz.com 6- The File API 17 •

    The programming model for most languages is taken from low-level system calls (POSIX) open(filename,mode,flags) # Open a file read(fd,buffer,maxsize) # Read into a buffer write(fd,buffer,nbytes) # Write a buffer close(fd) # Close a file seek(fd,offset,origin) # Seek to a new position tell(fd) # Get file pointer • It might be cleaned up a bit, but usually it's not much different than this
  842. Copyright (C) 2008, http://www.dabeaz.com 6- File Internals 18 f =

    open("foo.txt","r") mode : r flags : XX fp : 0 Operating System • Opening a file creates an OS data structure • The contents are not visible • Holds the state of the "file"
  843. Copyright (C) 2008, http://www.dabeaz.com 6- File Pointer 19 • Most

    useful internal state is the file pointer • Keeps track of current file position f = open("foo.txt","r") data = f.read(10) data = f.read(15) read(10) read(15) mode : r flags : XX fp : 25 Operating System foo.txt
  844. Copyright (C) 2008, http://www.dabeaz.com 6- Seek and Tell 20 •

    Manipulation of the file pointer >>> f = open("foo.txt","r") >>> f.seek(1024) # Set fp >>> data = f.read(76) >>> f.tell() 1100 >>> • It's exactly the same in most other languages
  845. Copyright (C) 2008, http://www.dabeaz.com 6- Multiple Open Files 21 •

    The same file can be open in more than one place at a time (even in the same program) • Each time you open a file, you get a new file object with a separate file pointer • Although each file is managed separately, they all operate on the same underlying data
  846. Copyright (C) 2008, http://www.dabeaz.com 6- Example 22 • Multiple file

    pointers >>> f = open("foo.txt","r") >>> f.readline() 'Hello World\n' >>> f.readline() 'This is a test\n' >>> g = open("foo.txt","r") >>> g.readline() 'Hello World\n' >>> f.tell() 25 >>> g.tell() 12 >>>
  847. Copyright (C) 2008, http://www.dabeaz.com 6- File Updates/Changes 23 • Changes

    to a file are reflected everywhere • If a file is opened for reading and the file contents get modified behind the scenes, those changes will affect subsequent read operations • Basically, everything stays in sync. • Details are covered in an OS class
  848. Copyright (C) 2007, http://www.dabeaz.com 6- Text Files 24 • By

    default, files are opened in text mode f = open(filename,"r") # Read, text mode f = open(filename,"w") # Write, text mode f = open(filename,"a") # Append, text mode • Text mode assumes line orientation • However, what is a line? some characters .......\n (Unix) some characters .......\r\n (Windows) some characters .......\r (Classic Mac) • This determination is made by the system
  849. Copyright (C) 2007, http://www.dabeaz.com 6- Newline Handling 25 • When

    writing, '\n' is translated to system newline >>> f = open("test.txt","w") >>> f.write("Hello World\n") >>> f.close() • Unix test.txt: Hello World\n • Windows test.txt: Hello World\r\n
  850. Copyright (C) 2007, http://www.dabeaz.com 6- Newline Handling 26 • When

    reading, system newline is converted back to the standard '\n' character >>> f = open("test.txt","r") >>> f.read() 'Hello World\n' >>> • Mostly, you don't have to worry about it • .... except if you do cross-platform work
  851. Copyright (C) 2007, http://www.dabeaz.com 6- Cross Platform Text Files 27

    • Example: Reading a Windows text file on Unix >>> f = open("test.txt","r") >>> f.readlines() ['Hello\r\n', 'World\r\n'] >>> • Here, you get that extra '\r' in the input • Which may break code next expecting it
  852. Copyright (C) 2007, http://www.dabeaz.com 6- Binary Files 28 • Binary

    data requires a different I/O mode f = open(filename,"rb") # Read, binary mode f = open(filename,"wb") # Write, binary mode f = open(filename,"ab") # Append, binary mode • Disables all newline translation (reads/writes) • Required for binary data on Windows • Optional, but supported on Unix (gotcha)
  853. Copyright (C) 2007, http://www.dabeaz.com 6- Binary File Example 29 •

    Difference between modes >>> open("example.txt","r").read() 'Hello World\n' >>> open("example.txt","rb").read() 'Hello World\r\n' >>> • Notice untranslated newline
  854. Copyright (C) 2007, http://www.dabeaz.com 6- Commentary 30 • Sadly, this

    business with text vs. binary is part of the operating system itself • All programming languages and applications on the system face the same issue • It's one reason why data is sometimes corrupted when transferred between systems (unintended newline expansion)
  855. Copyright (C) 2008, http://www.dabeaz.com 6- Concept: Processes • A "process"

    is a running program • Has its own dedicated resources • Memory, open files, net connections, etc. • Runs independently (own stack, PC, etc.) • Isolated from other processes • Closely associated with an "application" 32
  856. Copyright (C) 2008, http://www.dabeaz.com 6- The Interpreter Process • Dynamic

    programs usually execute inside an interpreter (which is the process) 33
  857. Copyright (C) 2008, http://www.dabeaz.com 6- Subprocesses • A program can

    create a new process • This is called a "subprocess" • The subprocess often runs under the control of the original process (which is known as the "parent" process) • Parent often wants to collect output or the status result of the subprocess 34
  858. Copyright (C) 2008, http://www.dabeaz.com 6- Subprocess Control • When launching

    a subprocess, the parent typically has control over the following: • Command line arguments • Environment variables • Standard I/O streams • Signal handling 35
  859. Copyright (C) 2008, http://www.dabeaz.com 6- Command Line Arguments • A

    list of strings 36 shell % foo.exe arg1 arg2 ... arg3 • In the target process, these shown up in argv C/C++: int main(int argc, char *argv[]) { ... } Java: public static void main(String argv[]) { ... } Python: sys.argv Perl: @ARGV
  860. Copyright (C) 2008, http://www.dabeaz.com 6- Environment Variables • A hash-table

    of string values 37 shell % setenv NAME VALUE shell % foo.exe • In the target process C/C++: char *value = getenv("NAME"); Python: value = os.environ['NAME'] Perl: $value = %ENV{'NAME'}
  861. Copyright (C) 2008, http://www.dabeaz.com 6- Standard I/O Streams • A

    set of files (stdin, stdout, stderr) 38 shell % foo.exe >out.txt shell % foo.exe <in.txt shell % foo.exe | bar.exe subprocess stdin stdout stderr • In the shell, controlled via redirection/pipes • Parent process sets up these files for subprocess
  862. Copyright (C) 2008, http://www.dabeaz.com 6- Signals • A parent process

    can signal a subprocess 39 shell % kill -signo pid shell % subprocess • Examples : suspend, terminate, etc. • On Unix, this is the "kill" command parent signal • On Windows, support is weak/nonexistent
  863. Copyright (C) 2008, http://www.dabeaz.com 6- Status Codes • When subprocess

    terminates, it returns a status • An integer code of some kind 40 C: exit(status); Java: System.exit(status); Python: raise SystemExit(status) • Convention is for 0 to indicate "success." Anything else is an error.
  864. Copyright (C) 2008, http://www.dabeaz.com 6- Commentary • Keep in mind

    that subprocesses are almost entirely independent from the parent • The parent can set up the environment, send signals, and collect return codes, but otherwise has no control over what happens inside the subprocess. 41
  865. Copyright (C) 2008, http://www.dabeaz.com 6- Running a Subprocess $a =

    `ls -l`; # Backticks. Perl/Ruby • Support for this varies • There may be some simple options 42 • This runs a shell command and captures the output. • However, it's lacking for a lot of other things • Will often see a process management module. Will illustrate for Python.
  866. Copyright (C) 2008, http://www.dabeaz.com 6- subprocess Module • A high-level

    module for subprocesses • Cross-platform (Unix/Windows) • Tries to consolidate the functionality of a wide- assortment of low-level system calls (system, popen(), exec(), spawn(), etc.) • Will illustrate with some common use cases 43
  867. Copyright (C) 2008, http://www.dabeaz.com 6- Executing Commands Problem: You want

    to execute a simple shell command or run a separate program. You don't care about capturing its output. import subprocess p = subprocess.Popen(['mkdir','temp']) q = subprocess.Popen(['rm','-f','tempdata']) • Executes a command string • Returns a Popen object (more in a minute) 44
  868. Copyright (C) 2008, http://www.dabeaz.com 6- Specifying the Command subprocess.Popen(['rm','-f','tempdata']) •

    Popen() accepts a list of command args 45 • These are the same as the args in the shell shell % rm -f tempdata • Note: Each "argument" is a separate item subprocess.Popen(['rm','-f','tempdata']) # Good subprocess.Popen(['rm','-f tempdata']) # Bad Don't merge multiple arguments into a single string like this.
  869. Copyright (C) 2008, http://www.dabeaz.com 6- PATH Environment >>> os.environ['PATH'] '/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:...'

    >>> • When launching a command, Popen() uses the setting of the PATH environment variable to search for the subprocess program 46 • Changes affect subsequent Popen() calls os.environ['PATH']="/mypath/bin:"+os.environ['PATH'] p = subprocess.Popen(["foo"])
  870. Copyright (C) 2008, http://www.dabeaz.com 6- Environment Vars env_vars = {

    'NAME1' : 'VALUE1', 'NAME2' : 'VALUE2', ... } p = subprocess.Popen(['cmd','arg1',...,'argn'], env=env_vars) • How to set up environment variables 47 • Note : If this is supplied and there is a PATH environment variable, it will be used to search for the command (Unix)
  871. Copyright (C) 2008, http://www.dabeaz.com 6- Current Directory p = subprocess.Popen(['cmd','arg1',...,'argn'],

    cwd='/some/directory') • If you need to change the working directory 48 • Note: This changes the working directory for the subprocess, but does not affect how Popen() searches for the command
  872. Copyright (C) 2008, http://www.dabeaz.com 6- Collecting Status Codes p =

    subprocess.Popen(['cmd','arg1',...,'argn']) ... status = p.wait() • When you launch a subprocess, it runs independently from the parent • To wait and collect status, use wait() 49 • Status will be the integer return code (which is also stored) p.returncode # Exit status of subprocess
  873. Copyright (C) 2008, http://www.dabeaz.com 6- Polling a Subprocess p =

    subprocess.Popen(['cmd','arg1',...,'argn']) ... if p.poll() is None: # Process is still running else: status = p.returncode # Get the return code • poll() - Checks status of subprocess 50 • Returns None if the process is still running, otherwise the returncode is returned
  874. Copyright (C) 2008, http://www.dabeaz.com 6- Killing a Subprocess p =

    subprocess.Popen(['cmd','arg1',...,'argn']) import os os.kill(p.pid,9) # • A notable omission (subprocess module provides no such functionality). • On Unix, can use os.kill() 51 • On Windows, a mess (many options) subprocess.Popen(['TASKKILL','/PID',str(p.pid),'/F']) import win32api win32api.TerminateProcess(int(p._handle),-1)
  875. Copyright (C) 2008, http://www.dabeaz.com 6- Capturing Output Problem: You want

    to execute another program and capture its output • Use additional options to Popen() import subprocess p = subprocess.Popen(['cmd'], stdout=subprocess.PIPE) data = p.stdout.read() • This works with both Unix and Windows • Captures any output printed to stdout 52
  876. Copyright (C) 2008, http://www.dabeaz.com 6- Sending/Receiving Data Problem: You want

    to execute a program, send it some input data, and capture its output • Set up pipes using Popen() p = subprocess.Popen(['cmd'], stdin = subprocess.PIPE, stdout = subprocess.PIPE) p.stdin.write(data) # Send data p.stdin.close() # No more input result = p.stdout.read() # Read output python cmd p.stdout p.stdin stdin stdout 53
  877. Copyright (C) 2008, http://www.dabeaz.com 6- Sending/Receiving Data Problem: You want

    to execute a program, send it some input data, and capture its output • Set up pipes using Popen() p = subprocess.Popen(['cmd'], stdin = subprocess.PIPE, stdout = subprocess.PIPE) p.stdin.write(data) # Send data p.stdin.close() # No more input result = p.stdout.read() # Read output python cmd p.stdout p.stdin stdin stdout 54 Pair of files that are are hooked up to the subprocess
  878. Copyright (C) 2008, http://www.dabeaz.com 6- Sending/Receiving Data • How to

    capture stderr p = subprocess.Popen(['cmd'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) python cmd p.stdout p.stdin stdin stdout 55 p.stderr stderr • Note: stdout/stderr can also be merged p = subprocess.Popen(['cmd'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  879. Copyright (C) 2008, http://www.dabeaz.com 6- I/O Redirection • Connecting input

    to a file f_in = open("somefile","r") p = subprocess.Popen(['cmd'], stdin=f_in) 56 • Connecting the output to a file f_out = open("somefile","w") p = subprocess.Popen(['cmd'], stdout=f_out) • Basically, stdin and stdout can be connected to any open file object • Note : Must be a real file in the OS
  880. Copyright (C) 2008, http://www.dabeaz.com 6- Subprocess I/O • Subprocess module

    can be used to set up fairly complex I/O patterns 57 import subprocess p1 = subprocess.Popen("ls -l", shell=True, stdout=subprocess.PIPE) p2 = subprocess.Popen("wc",shell=True, stdin=p1.stdout, stdout=subprocess.PIPE) out = p2.stdout.read() • Note: this is the same as this popen2.popen2("ls -l | wc")
  881. Copyright (C) 2008, http://www.dabeaz.com 6- I/O Issues • Some care

    required when communicating with subprocesses 58 • To signal end of input, don't forget to close the input stream p = subprocess.Popen(['cmd'], stdout=subprocess.PIPE, stdin=subprocess.PIPE) p.stdin.write(data) # Send data p.stdin.close() # No more input result = p.stdout.read() # Read output • If you forget, subprocess may hang
  882. Copyright (C) 2008, http://www.dabeaz.com 6- I/O Issues • subprocess modules

    does not work well for controlling interactive processes • Buffering behavior is often wrong (may hang) • Pipes don't properly emulate terminals • Subprocess may not operate correctly 59
  883. Copyright (C) 2008, http://www.dabeaz.com 6- Unix Process Fork Problem: You

    want to clone the original process and have two identical processes • fork(), wait(), _exit() import os pid = os.fork() if pid == 0: # Child process ... os._exit(0) else: # Parent process ... # Wait for child os.wait(pid) python python fork() _exit() wait() concurrent execution 60
  884. Copyright (C) 2008, http://www.dabeaz.com 6- Unix Process Fork • fork()

    creates an identical process • Newly created process is a "child process" • fork() returns different values in parent/child import os pid = os.fork() if pid == 0: # Child process else: # Parent process 61 pid is 0 in child, non-zero in parent • Parent and child run independently afterwards
  885. Copyright (C) 2008, http://www.dabeaz.com 6- Unix Process Fork • Most

    common use-case: multiple clients • Server forks a new process to handle each client 62 Server listening Server Server Server Server fork() Client Client Client Client new clients
  886. Copyright (C) 2008, http://www.dabeaz.com 6- Unix Process Fork • Note:

    There are MANY tricky details • Typically covered in an operating systems course • Consult: "Advanced Unix Programming" by W. Richard Stevens 63
  887. Copyright (C) 2008, http://www.dabeaz.com 6- I/O Layers • One problem

    with I/O is that data is often encoded in a variety of different formats • Compression (gz, bz2, zip, etc.) • Unicode (UTF-8, UTF-16, etc.) • Text (Base64, Hex, Quopri, etc.) • Data might be a mix of formats • Example: A compressed UTF-8 file 65
  888. Copyright (C) 2008, http://www.dabeaz.com 6- Solution • Create file-like layers

    that get stacked on top of the existing file object interface 66 File bzip2 utf-8 write write write read read read
  889. Copyright (C) 2008, http://www.dabeaz.com 6- Solution • Each layer presents

    itself as a file but is really just a wrapper around a low-level file • This kind of approach is used by Java • Also starting to show up in dynamic languages • Let's look at codecs in Python 67
  890. Copyright (C) 2007, http://www.dabeaz.com 6- codecs.open • codecs.open(filename,mode,encoding) • Opens

    a normal file using a specific encoding • Example: 68 >>> f = codecs.open("file.bz2","rb","bz2") >>> f.read() 'Hello World\n' >>>
  891. Copyright (C) 2007, http://www.dabeaz.com 6- codecs.open 69 "base64" Base64 Encoding

    "hex" Hex encoding "bz2" Bz2 compression "quopri" MIME Quoted printable "string_escape" String with escape codes "zlib" Zip compression • Example encodings: • Examples: f = codecs.open("file.txt","r","base64") g = codecs.open("file.txt","w","quopri") ...
  892. Copyright (C) 2007, http://www.dabeaz.com 6- codecs Issues • Codecs behave

    like files, but certain operations may break the encoding process • Example: Random access/seeks • Example: Invalid data (encoding error) • Your mileage might vary 70
  893. Copyright (C) 2007, http://www.dabeaz.com 6- codecs and Strings • Strings

    and codecs are friends 71 s.encode(encoding) # Encode a string s.decode(encoding) # Decode a string • Example: >>> s = "Hello World" >>> t = s.encode("base64") >>> t 'SGVsbG8gV29ybGQ=\n' >>> t.decode("base64") 'Hello World' >>>
  894. Copyright (C) 2007, http://www.dabeaz.com 6- codecs and Strings • Can

    manually handle encodings • Just call encode/decode yourself as needed 72 >>> f = open("foo.dat","wb") >>> f.write(data.encode("zlib")) >>> f.close() >>> g = open("foo.dat","rb") >>> data = g.read().decode("zlib") >>> g.close() • Note: using codecs module may be more memory efficient
  895. Copyright (C) 2007, http://www.dabeaz.com 6- Unicode • Unicode: Multibyte characters

    73 s = u"Hello World" t = u"Jalape\u00f1o" • Encodes characters from all used languages • Widely used on Internet (internationalization) • A huge topic (won't cover all details)
  896. Copyright (C) 2007, http://www.dabeaz.com 6- Unicode Representation • Internally, Python

    stores Unicode as 16-bit integers (UCS-2) 74 t = u"Jalape\u00f1o" 004a 0061 006c 0061 0070 0065 00f1 006f • Normally, you don't worry about this • Except if you write a unicode string to a file u"J" --> 00 4a (Big Endian) u"J" --> 4a 00 (Little Endian)
  897. Copyright (C) 2007, http://www.dabeaz.com 6- Unicode and Codecs • Unicode

    I/O always involves some encoding • Handled through codecs module 75 >>> f = codecs.open("data.txt","w","utf-8") >>> f.write(u"Hello World\n") >>> f.close() >>> f = codecs.open("data.txt","w","utf-16") >>> f.write(data) >>> • Several hundred character codecs are provided • Consult documentation for details
  898. Copyright (C) 2007, http://www.dabeaz.com 6- Unicode Encodings • Explicit encoding

    via strings 76 >>> a = u"Jalape\u00f1o" >>> enc_a = a.encode("utf-8") >>> • Example: Writing Unicode strings to a file >>> f = open(filename,"wb") >>> f.write(data.encode("utf-8")) • Note: Since encoding may contain binary data, should probably use binary file modes.
  899. Copyright (C) 2007, http://www.dabeaz.com 6- Unicode Decoding • Strings can

    also be decoded into Unicode 77 >>> enc_a = 'Jalape\xc3\xb1o' >>> a = enc_a.decode("utf-8") >>> a u'Jalape\xf1o' >>> • Example: Reading Unicode strings to a file >>> f = open(filename,"rb") >>> enc_data = f.read() >>> data = enc_data.decode("utf-8") • Again, be aware that Unicode data may contain binary data
  900. Copyright (C) 2007, http://www.dabeaz.com 6- Finding the Encoding • How

    do you determine the encoding of a file? • Might be known in advance (strongly typed) • Often indicated in the file itself 78 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> • Depends on the data source, application, etc.
  901. Copyright (C) 2008, http://www.dabeaz.com 6- Regular Expressions • Virtually all

    dynamic languages have extensive support for text pattern processing with regular expressions • And in some languages, regular expressions are part of the language itself (e.g., Perl and Ruby). • Let's dig a little deeper... 80
  902. Copyright (C) 2008, http://www.dabeaz.com 6- The Problem • Many problems

    involve searching and matching specific text patterns. • Example: email addresses 81 Please send email to [email protected] and maybe you will get a response (maybe). • Example: URLs Go look on http://www.google.com for details. • Example: A U.S. phone number 773-555-1212
  903. Copyright (C) 2008, http://www.dabeaz.com 6- The Problem • Specifying and

    matching a text pattern is a much more complicated problem than looking for an exact substring • Must have a concise and easy way to specify the legal characters that make up a pattern along with the order in which they are supposed to appear 82
  904. Copyright (C) 2008, http://www.dabeaz.com 6- Solution : Regexs • A

    regular expression is a concise specification of a text pattern • Built from a few basic rules: 83 abc Matches the chars 'abc' exactly [chars] Match characters in a set [^chars] Match characters not in a set pat1|pat2 Matches either pat1 or pat2 pat* Zero or more repetitions of pat pat+ One or more repetitions of pat pat? Zero or more occurence of pat (pat) A group the matches pat • These are then combined
  905. Copyright (C) 2008, http://www.dabeaz.com 6- Regex Example 84 • A

    pattern to match the title of an HTML doc <title>(.*?)</title> • Problem 1 : Print all matching lines <html> <head> <title>This is an example</title> </head> <body> ... </body> </html>
  906. Copyright (C) 2008, http://www.dabeaz.com 6- Regex in the Language 85

    • Perl open(INFILE,"foo.html"); while ($line = <INFILE>) { if ($line =~ /<title>(.*?)<\/title>/) { print $line; } } • Ruby f = open("foo.html") for line in f if line =~ /<title>(.*?)<\/title>/ print line end end
  907. Copyright (C) 2008, http://www.dabeaz.com 6- Regex as an object 86

    • Example: Python import re pat = re.compile('<title>(.*?)</title>') for line in open("foo.html"): if pat.search(line): print line • Here, the regex features are just in a library module. There is no special syntax or operators devoted to matching (it's a method)
  908. Copyright (C) 2008, http://www.dabeaz.com 6- Regex Example 87 • A

    pattern to match the title of an HTML doc <title>(.*?)</title> • Problem 2 : Extract just the title text itself <title>This is an example</title> This is an example
  909. Copyright (C) 2008, http://www.dabeaz.com 6- Groups • Regular expressions may

    define groups <title>(.*?)</title> ([\w-]+):(.*) • Groups are assigned numbers <title>(.*?)</title> ([\w-]+):(.*) 1 1 2 • Number determined left-to-right 88
  910. Copyright (C) 2008, http://www.dabeaz.com 6- Group Extraction 89 • Perl

    open(INFILE,"foo.html"); while ($line = <INFILE>) { if ($line =~ /<title>(.*?)<\/title>/) { print $1,"\n"; } } • Ruby f = open("foo.html") for line in f if line =~ /<title>(.*?)<\/title>/ print Regexp.last_match(1),"\n" end end
  911. Copyright (C) 2008, http://www.dabeaz.com 6- Group Extraction 90 • Example:

    Python import re pat = re.compile('<title>(.*?)</title>') for line in open("foo.html"): m = pat.search(line) if m: print m.group(1)
  912. Copyright (C) 2008, http://www.dabeaz.com 6- Regex Example 91 • A

    pattern to match the title of an HTML doc <title>(.*?)</title> • Problem 3 : Change the title to subject <title>This is an example</title> <subject>This is an example</subject>
  913. Copyright (C) 2008, http://www.dabeaz.com 6- Text Substitution 92 • Perl

    open(INFILE,"foo.html"); while ($line = <INFILE>) { $line =~ s/<title>(.*?)<\/title>/<subject>\1<\/subject/; print $line; } • Ruby f = open("foo.html") for line in f line.gsub!(/<title>(.*?)<\/title>/, '<subject>\1</subject>') print line end
  914. Copyright (C) 2008, http://www.dabeaz.com 6- Text Substitution 93 • Example:

    Python import re pat = re.compile('<title>(.*?)</title>') for line in open("foo.html"): line = pat.sub('<subject>\\1</subject>',line) print line,
  915. Copyright (C) 2008, http://www.dabeaz.com 6- Commentary • Knowing how to

    use regular expressions is mostly a matter of reading the manual • All of the books on dynamic languages cover them • Most programmers have used them at some point • So, I'm not going to continue with a manual 94
  916. Copyright (C) 2008, http://www.dabeaz.com 6- Behind the Scenes • I

    think most programmers (including myself), think regular expressions involve some fairly serious magic • This is partly true....especially for some of the more hard-core features • However, how do they really work? • Is there anything interesting to be learned by looking into this? Well, maybe.... 95
  917. Copyright (C) 2008, http://www.dabeaz.com 6- Some History • Regular expressions

    originate in theoretical computer science--automata theory. • First appear sometime in the 1950s • They were popularized greatly by Ken Thompson who incorporated regex capabilities into the Unix ed editor (~1970) • They then propagated to to other Unix tools (grep, awk, vi, lex, emacs, etc.) 96
  918. Copyright (C) 2008, http://www.dabeaz.com 6- The regex Library • A

    free software library written by Henry Spencer (~1985) • Used to build regex support in early versions of Perl and Tcl which then expanded upon/rewrote the library • Almost every modern language with regex support derives directly/indirectly from the Spencer library (or at least its approach) 97
  919. Copyright (C) 2008, http://www.dabeaz.com 6- Patterns -> NFA • Regular

    expression patterns are typically used to build a NFA • Non-deterministic Finite Automata • Covered in great detail in theory course, but let's look at the general idea 98
  920. Copyright (C) 2008, http://www.dabeaz.com 6- Patterns to NFA • Example

    Pattern 99 ba*b b b > a start final • The resulting NFA • To match, you "run" the NFA against input
  921. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Example:

    Search for the pattern ba*b 100 Input: "aabaabbbab" b b > a start final You start in the initial state Beginning of input text
  922. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Start

    testing characters 101 Input: "aabaabbbab" b b > a start final fail
  923. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Start

    testing characters 102 Input: "aabaabbbab" b b > a start final fail
  924. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Now

    start moving through states 103 Input: "aabaabbbab" b b > a start final start
  925. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Keep

    going as long as there is a legal move 104 Input: "aabaabbbab" b b > a start final start
  926. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Keep

    going... 105 Input: "aabaabbbab" b b > a start final start
  927. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Keep

    going... 106 Input: "aabaabbbab" b b > a start final start fail! • In the current state, there is no b arrow.
  928. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Backtrack

    and try the other path out 107 Input: "aabaabbbab" b b > a start final start The unlabeled arrow here means that we can just move to the next state without reading any input
  929. Copyright (C) 2008, http://www.dabeaz.com 6- Running an NFA • Final

    state! - A Match 108 Input: "aabaabbbab" b b > a start final start end • If you can make it to the final state, it matches
  930. Copyright (C) 2008, http://www.dabeaz.com 6- Building NFAs • All regular

    expression patterns can be turned into an NFA using a few primitive building blocks • It's tricky, but relatively straightforward • You can read details 109
  931. Copyright (C) 2008, http://www.dabeaz.com 6- Pathological Cases • Almost all

    dynamic languages are using an approach NFA matching known as "recursive backtracking" • This involves trying all possibilities until a suitable match is found • And it can lead to some pathlogical cases • Certain patterns that match very poorly 110
  932. Copyright (C) 2008, http://www.dabeaz.com 6- Example: • A pathological pattern

    111 a?a?a?aaa • And the NFA > a a a a a a These arrows mean the "a" is optional
  933. Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Match text

    114 aaa • Step 3 > a a a a a a At this point, there is no more text, and we're not in the final state (a failure)
  934. Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Match text

    115 • Step 4 (Backtrack) > a a a a a a Try the empty arrow here aaa
  935. Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Match text

    116 • Step 5 > a a a a a a This fails again. Not final, no more text. aaa
  936. Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Match text

    117 • Step 6 - Backtrack > a a a a a a aaa Go back and try this empty arrow
  937. Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Match text

    122 • Step 11 > a a a a a a aaa Fails. Backtrack • You're starting to get the idea...
  938. Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Match text

    123 • Eventually will arrive here... > a a a a a a aaa A match! • Took many guesses • Re-reading of the input string
  939. Copyright (C) 2008, http://www.dabeaz.com 6- Recursive Backtracking • Reading: 124

    Russ Cox, "Regular Expression Matching Can be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)"
  940. Copyright (C) 2008, http://www.dabeaz.com 6- Wrap-up 126 • Next time,

    will look at some functional programming issues
  941. Copyright (C) 2008, http://www.dabeaz.com 7- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  942. Copyright (C) 2008, http://www.dabeaz.com 7- Introduction 3 • Over the

    last few classes, we have spent a lot of time looking at "objects" • An object encapsulates data and has a collection of methods that operate on that data • However, this is not the only way to do it • Let's return to functions...
  943. Copyright (C) 2008, http://www.dabeaz.com 7- Programming with Functions 4 •

    It turns out that you can do a lot of very useful programming just using functions • "Functional programming" • Mathematicians study functions a lot • However, there are some essential features that you need to move beyond the basics • Today, we'll look at it in a little more detail
  944. Copyright (C) 2008, http://www.dabeaz.com 7- Confession 5 • The more

    I program, the more I find myself drawn towards functional programming • It feels more logically coherent than OO • And a lot less byzantine • Plus, I have a secret past as a math major • Oh yeah, and get off my lawn!
  945. Copyright (C) 2008, http://www.dabeaz.com 7- Disclaimer 6 • Functional programming

    is a HUGE topic • Which can be highly mathematical • I'm not going to take that approach... • Especially with my brain pounding cold • However, I will try to cover a few absolute basics and show some interesting examples
  946. Copyright (C) 2008, http://www.dabeaz.com 7- Other Disclaimer 7 • Almost

    all examples are going to be Python • Python is by no means considered to be a purely "functional" language • However, it has enough of the core features for me to illustrate some interesting things • People often remark on its freakish similarity to certain parts of Lisp
  947. Copyright (C) 2008, http://www.dabeaz.com 7- Review : Functions 9 •

    A function is a series of statements def foo(x): statements ... some calculation ... statements return result • A function receives input arguments • Performs some kind of calculation • Returns a result
  948. Copyright (C) 2008, http://www.dabeaz.com 7- Functions as Objects 10 •

    A function is also an object that you can treat like it was ordinary data def square(x): return x*x • You can assign it to a variable s = square • Put it in a list items = [1,"Hello",square] • Pass it as an argument to another function y = foo(3,square)
  949. Copyright (C) 2008, http://www.dabeaz.com 7- Functions as Objects 11 •

    In fact, there isn't anything that's allowed on the other objects, but which is forbidden on a function object • The only difference is that the contents of a function don't look like anything you're used to (number, string, array, etc.) • In reality, it's just a sequence of statements
  950. Copyright (C) 2008, http://www.dabeaz.com 7- First-Class Functions 12 • If

    functions have equal footing with numbers, strings, and other core datatypes, then they're said to be "first-class" • Basically, it means that functions are nothing special---they're just like anything else in the language
  951. Copyright (C) 2008, http://www.dabeaz.com 7- Callbacks 13 • Having first-class

    functions lets you pass functions into other functions as an argument • This allows a program to make use of so- called "callback" functions • Functions that get executed under certain circumstances by another function
  952. Copyright (C) 2008, http://www.dabeaz.com 7- Callback Example 14 • Classic

    use case: Supplying the comparison function for a list sort def wordcmp(s,t): s_l = s.lower() t_l = t.lower() if s_l < t_l : return -1 elif s_l > t_l : return 1 else: return 0 words = ['MONDO','diabolical','Thrash'] words.sort(wordcmp) # Produces ['diabolical','MONDO','Thrash'] • sort() "calls back" into the compare function to help it figure out the ordering
  953. Copyright (C) 2008, http://www.dabeaz.com 7- Functions as Data 15 •

    The fact that functions are data opens up a variety of interesting possibilities • Can have stored tables and collections of functions (already saw that with classes) • Also, functions can be passed around to different parts of a program • At first glance, all of this might sound a little exotic (but we'll see examples soon)
  954. Copyright (C) 2008, http://www.dabeaz.com 7- Inner Functions 17 • Inside

    a function, you can define new functions and use them elsewhere (like data) • Example: def make_greeting(name): def greet(): print "Hey %s, get off my lawn!" % name return greet • Check it out - a function was returned >>> p = make_greeting("Punk") >>> p <function greet at 0x69330> >>>
  955. Copyright (C) 2008, http://www.dabeaz.com 7- Calling Inner Functions 18 •

    Using an "inner function" is interesting • It secretly carries information about all of the variables that were alive when it was defined >>> p = make_greeting("Punk") >>> p <function greet at 0x69330> >>> p() Hey Punk, get off my lawn! >>> Notice how it somehow picked up the name variable
  956. Copyright (C) 2008, http://www.dabeaz.com 7- Closures 19 • A function

    together with its surrounding environment is known as a "closure" • Basically, the closure has all of the information needed to make the function execute correctly • Normally all of this is tucked away behind the scenes (it just works)
  957. Copyright (C) 2008, http://www.dabeaz.com 7- Closures 20 • You can

    inspect the closure if sneaky >>> p = make_greeting("Punk") >>> p <function greet at 0x69330> >>> p.func_closure (<cell at 0x6c950: str object at 0x6c9e0>,) >>> p.func_closure[0].cell_contents 'Punk' >>> • A closure is almost like a weird kind of "object" >>> k = make_greeting("Kid") >>> g = make_greeting("Governor") >>> g() Hey Governor, get off my lawn! >>> k() Hey Kid, get off my lawn! >>>
  958. Copyright (C) 2008, http://www.dabeaz.com 7- Interlude 21 • So far,

    just scratched the surface of what it means for functions to be "first-class" • You can pass existing functions around as data • You can create new functions on-the-fly • Newly created functions retain parts of the environment where they were created • This is where it starts to get interesting...
  959. Copyright (C) 2008, http://www.dabeaz.com 7- Observation 23 • In a

    lot of programs, it seems like you collect a bunch of data into a list • You then apply different operations to the list data to get some new data
  960. Copyright (C) 2008, http://www.dabeaz.com 7- Example: Stock Portfolio 24 •

    In the assignment, you wrote some programs that worked with a portfolio of stocks • There was some data in a file (a list of lines) MSFT,100,54.25 IBM,50,91.10 AA,25,23.10 CAT,75,70.13 MSFT,50,64.23 GM,200,45.11 HPQ,80,37.42 IBM,40,88.20 PG,125,56.22 BA,75,92.72 MSFT,50,71.21 AIG,40,41.81
  961. Copyright (C) 2008, http://www.dabeaz.com 7- Lists as a Data Structure

    25 • And in the assignment, it was natural to turn that file into a list (of lists perhaps) portfolio = [ ['MSFT', 100, 54.25], ['IBM' , 50 , 91.10], ['AA' , 25 , 23.10], ['CAT' , 75 , 70.13], ['MSFT', 50 , 64.23], ['GM' , 200, 45.11], ['HPQ' , 80 , 37.42], ['IBM' , 40 , 88.20], ['PG' , 125, 56.22], ['BA' , 75 , 92.72], ['MSFT', 50 , 71.21], ['AIG' , 40 , 41.81] ]
  962. Copyright (C) 2008, http://www.dabeaz.com 7- Example Calculation 26 • Example:

    Calculate the cost of the portfolio total = 0.0 for stock in portfolio: total += stock[1]*stock[2] print "Total", total • Involves a list of "stocks" • Iterating over this list • Performing an "operation" on each item
  963. Copyright (C) 2008, http://www.dabeaz.com 7- List Operations 27 • Most

    common list operations can be distilled down to three basic ops • mapping • filtering • reduction
  964. Copyright (C) 2008, http://www.dabeaz.com 7- map() 28 • An operation

    that maps a function to each item of a list, producing a new list def map(func, items): result = [] for it in items: result.append(func(it)) return result • Example: def square(x): return x*x nums = [1,2,3,4,5] sqs = map(square,nums) # [1,4,9,16,25]
  965. Copyright (C) 2008, http://www.dabeaz.com 7- filter() 29 • An operation

    that checks each item and discards those that don't match a condition def filter(condf,items): result = [] for it in items: if condf(it): result.append(it) return result • Example: def positive(x): return x > 0 nums = [1,-2,3,-4,5] p = filter(positive,nums) # [1,3,5]
  966. Copyright (C) 2008, http://www.dabeaz.com 7- reduce() 30 • An operation

    that combines successive list elements and produces a single result def reduce(combinef,items,initial=0) result = initial for it in items: result = combinef(result,it) return result • Example: def add(x,y): return x+y nums = [1,2,3,4,5] total = reduce(add,nums) # total = 15
  967. Copyright (C) 2008, http://www.dabeaz.com 7- Example 31 • Calculate the

    total cost of all stocks in the portfolio with 100 or more shares # Three functions def cost_f(s): return s[1]*s[2] def hundred_f(s): return s[1] >= 100 def add_f(x,y): return x+y # Now, using these operations stocks = filter(hundred_f, portfolio) costs = map(cost_f,stocks) total = reduce(add_f,costs) • It's a little clunky, but essentially applying operations to entire lists at each step
  968. Copyright (C) 2008, http://www.dabeaz.com 7- Commentary 32 • Every dynamic

    language already has some variation of map, filter, and reduce • And some common reductions (sum, min, max) • These are basic list/array operations • Have been around for almost forever. • Look in the manual for details.
  969. Copyright (C) 2008, http://www.dabeaz.com 7- Problem 34 • Since functions

    can be easily passed around, you often end up writing code that relies on a lot of small functions or "formulas" # Three functions def cost_f(s): return s[1]*s[2] def hundred_f(s): return s[1] >= 100 def add_f(x,y): return x+y # Now, using these operations stocks = filter(hundred_f, portfolio) costs = map(cost_f,stocks) total = reduce(add_f,costs) • This style gets old real fast...
  970. Copyright (C) 2008, http://www.dabeaz.com 7- Solution : Lambda 35 •

    Lambda expressions. Creates a function right on the spot for you # Now, using these operations stocks = filter(lambda s: s[1] >= 100, portfolio) costs = map(lambda s: s[1]*s[2], stocks) total = reduce(lambda x,y: x+y,costs) • Lambda creates a function that is a single expression lambda x,y : x+y # Is the same as typing this out long-form def anon(x,y): return x+y
  971. Copyright (C) 2008, http://www.dabeaz.com 7- Lambda 36 • Don't read

    too much into this lambda stuff • It's just a special syntax that let's us take a simple expression and quickly turn it into an unnamed function • Often more convenient than defining a separate function elsewhere • Name "lambda" comes from Lisp which comes from the "Lambda Calculus"
  972. Copyright (C) 2008, http://www.dabeaz.com 7- Using Lambda 37 • Lambda

    is interesting because it's actually an expression • You can use it anywhere an expression goes ops = { '+' : lambda x,y: x+y, '-' : lambda x,y: x-y, '*' : lambda x,y: x*y, '/' : lambda x,y: x/y } >>> ops['+'](3,4) 7 >>> ops['*'](3,4) 12 >>>
  973. Copyright (C) 2008, http://www.dabeaz.com 7- Lambda Use 38 • In

    reality, you probably want to use it sparingly • Overuse makes code impossible to decipher • Not as powerful as defining a normal function • The body of a lambda can only be a single expression (not a bunch of statements)
  974. Copyright (C) 2007, http://www.dabeaz.com 7- Discussion • map/filter operations are

    very common • Is there an even more convenient way to perform this operation? 40
  975. Copyright (C) 2007, http://www.dabeaz.com 7- List Comprehensions • Creates a

    new list by applying an operation to each element of a sequence. >>> a = [1,2,3,4,5] >>> b = [2*x for x in a] >>> b [2,4,6,8,10] >>> • Another example: 41 >>> names = ['Elwood','Jake'] >>> a = [name.lower() for name in names] >>> a ['elwood','jake'] >>>
  976. Copyright (C) 2007, http://www.dabeaz.com 7- List Comprehensions • A list

    comprehension can also filter >>> f = open("stockreport","r") >>> goog = [line for line in f if 'GOOG' in line] >>> >>> a = [1, -5, 4, 2, -2, 10] >>> b = [2*x for x in a if x > 0] >>> b [2,8,4,20] >>> • Another example 42
  977. Copyright (C) 2007, http://www.dabeaz.com 7- List Comprehensions • General syntax

    [expression for x in s if condition] • What it means result = [] for x in s: if condition: result.append(expression) 43 • Basically, this is map/filter rolled into one op
  978. Copyright (C) 2007, http://www.dabeaz.com 7- List Comprehensions • The general

    syntax (in full) [expression for x in s if cond1 for y in t if cond2 ... if condfinal] • What it means result = [] for x in s: if cond1: for y in t: if cond2: if condfinal: result.append(expression) 44
  979. Copyright (C) 2007, http://www.dabeaz.com 7- List Comp: Examples • List

    comprehensions are hugely useful • Collecting the values of a specific field stocknames = [s['name'] for s in stocks] • Performing database-like queries a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ] • Quick mathematics over sequences cost = sum([s['shares']*s['price'] for s in stocks]) 45
  980. Copyright (C) 2007, http://www.dabeaz.com 7- Historical Digression • List comprehensions

    come from Haskell a = [x*x for x in s if x > 0] # Python a = [x*x | x <- s, x > 0] # Haskell 46 • And this is motivated by sets (from math) a = { x2 | x ∈ s, x > 0 }
  981. Copyright (C) 2007, http://www.dabeaz.com 7- Big Idea: Being Declarative •

    List comprehensions encourage a more "declarative" style of programming when processing sequences of data. • Data can be manipulated by simply "declaring" a series of statements that perform various operations on it. • Although, it may require some care... 47
  982. Copyright (C) 2007, http://www.dabeaz.com 7- Example • Reading a portfolio

    of stocks lines = open("dowportfolio.csv") fields = [line.split(",") for line in lines] portfolio = [[f[0],int(f[1]),float(f[2])] for f in fields] 48 • Performing a calculation total = sum([s[1]*s[2] for s in portfolio if s[1] >= 100]) • We're just applying list operation after list operation to get the result we want
  983. Copyright (C) 2008, http://www.dabeaz.com 7- Closures Revisited • Consider this

    example: 50 def add(x,y): def do_add(): return x+y return do_add • This function creates a new function that performs a calculation when it runs (later) >>> r = add(3,4) >>> r <function do_add at 0x693b0> >>> r() 7 >>>
  984. Copyright (C) 2008, http://www.dabeaz.com 7- Lazy Evaluation • The last

    example illustrates something known as "lazy" evaluation • A function was created to perform some work • But the execution of the function didn't occur until later on (it was delayed) • This style of programming can be used for all sorts of good and evil 51
  985. Copyright (C) 2008, http://www.dabeaz.com 7- Example 52 • Packaging up

    expensive calculations in a way where they will only be carried out if actually requested later • Example : Fetch a URL, but, not right now import urllib def prepare_download(url): def do_download(): return urllib.urlopen(url).read() return do_download >>> d = prepare_download("http://www.blah.com") ... >>> text = d() # Okay, do it
  986. Copyright (C) 2008, http://www.dabeaz.com 7- Example 53 • Supply only

    some of the arguments to a function now (get the rest later) def partial(func,*args): def call(func,*moreargs): return func(*(args+moreargs)) return call • Example: def add(x,y,z): return x+y+z a = partial(add,2,3) ... print a(4) # prints 9 : 2 + 3 + 4 print a(10) # prints 15 : 2 + 3 + 10
  987. Copyright (C) 2008, http://www.dabeaz.com 7- Example 54 • tail -f

    a logfile import time def tail(thefile): thefile.seek(0,2) # Go to EOF def do_next(): while True: line = thefile.readline() if line: return line time.sleep(0.1) return do_next • Example: >>> next = tail(open("logfile","r")) >>> while True: ... print next(),
  988. Copyright (C) 2008, http://www.dabeaz.com 7- Discussion 56 • The tail

    -f example was interesting • That function created a function which emitted new a new line from a file every time you called it • You might be able to expand on that idea by writing functions that generate sequences
  989. Copyright (C) 2008, http://www.dabeaz.com 7- Generator Example 57 • Example

    : A Countdown def countdown(n): while n > 0: yield n n = n - 1 • This spits out new values for use in a for-loop • Example: >>> c = countdown(5) >>> for n in c: ... print n, ... 5 4 3 2 1 >>>
  990. Copyright (C) 2008, http://www.dabeaz.com 9- Generator Expressions • A generator

    version of a list comprehension >>> a = [1,2,3,4] >>> b = (2*x for x in a) >>> b <generator object at 0x58760> >>> for i in b: print b, ... 2 4 6 8 >>> • Important differences • Does not construct a list. • Only useful purpose is iteration • Once consumed, can't be reused 58
  991. Copyright (C) 2008, http://www.dabeaz.com 9- Generator Expressions • General syntax

    (expression for i in s for j in t ... if conditional) • Can also serve as a function argument sum(x*x for x in a) • Can be applied to any iterator >>> a = [1,2,3,4] >>> b = (x*x for x in a) >>> c = (-x for x in b) >>> for i in c: print i, ... -1 -4 -9 -16 >>> 59
  992. Copyright (C) 2008, http://www.dabeaz.com 9- Generator Expressions • Example: Sum

    a field in a large input file f = open("datfile.txt") # Strip all lines that start with a comment lines = (line for line in f if not line.startswith('#')) # Split the lines into fields fields = (s.split() for s in lines) # Sum up one of the fields print sum(float(f[2]) for f in fields) • Solution 60 823.1838823 233.128883 14.2883881 44.1787723 377.1772737 123.177277 143.288388 3884.78772 ...
  993. Copyright (C) 2008, http://www.dabeaz.com 9- Generator Expressions • Solution 61

    • Each generator expression only evaluates data as needed (lazy evaluation) • Example: Running above on a 6GB input file only consumes about 60K of RAM f = open("datfile.txt") # Strip all lines that start with a comment lines = (line for line in f if not line.startswith('#')) # Split the lines into fields fields = (s.split() for s in lines) # Sum up one of the fields print sum(float(f[2]) for f in fields)
  994. Copyright (C) 2008, http://www.dabeaz.com 9- Commentary • With generators, you

    start to think of setting up functions as a processing pipeline • Almost like using pipes in Unix • A larger example follows shortly 62
  995. Copyright (C) 2008, http://www.dabeaz.com 9- Wrap-up • This has only

    been a small taste of functional programming idioms • If you go further, focus on organization of functions, closures, routing of data, etc. • Personally, I think it's a fun way to program • Very different than OO however... 64
  996. Copyright (C) 2008, http://www.dabeaz.com 8- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  997. Copyright (C) 2008, http://www.dabeaz.com 8- Overview 3 • Going to

    look at more examples • More with using generators/iterators • Introduction to network programming
  998. Copyright (C) 2008, http://www.dabeaz.com 8- Generators 5 • Last time,

    we ended with some discussion of generator functions • However, didn't get a chance to look at more interesting examples • Let's spend a little more time on this
  999. Copyright (C) 2008, http://www.dabeaz.com 8- Generator Example 6 • Example

    : A Countdown def countdown(n): while n > 0: yield n n = n - 1 • This spits out new values for use in a for-loop • Example: >>> c = countdown(5) >>> for n in c: ... print n, ... 5 4 3 2 1 >>>
  1000. Copyright (C) 2008, http://www.dabeaz.com 8- Ruby is Similar 7 •

    Example : A Countdown class Countdown def initialize(n) @start = n end def each n = @start while n > 0 yield n n -= 1 end end end • Use for i in Countdown.new(5) puts i end
  1001. Copyright (C) 2008, http://www.dabeaz.com 8- Generator Expressions • A generator

    version of a list comprehension >>> a = [1,2,3,4] >>> b = (2*x for x in a) >>> b <generator object at 0x58760> >>> for i in b: print b, ... 2 4 6 8 >>> 8 • Generates a sequence of values where some operation has been applied
  1002. Copyright (C) 2008, http://www.dabeaz.com 8- Question • Aside from being

    an exotic language feature, how do you go about using generators? • Is there any practical use of this? 9
  1003. Copyright (C) 2008, http://www.dabeaz.com 8- Generators as a Pipeline •

    Generators are most effectively used to set up data processing pipelines • Similar to pipes in Unix 10 % ls -l | wc • Can structure programs as stages of processing chained together
  1004. Copyright (C) 2008, http://www.dabeaz.com 8- Programming Problem 11 Find out

    how many bytes of data were transferred by summing up the last column of data in this Apache web server log 81.107.39.38 - ... "GET /ply/ HTTP/1.1" 200 7587 81.107.39.38 - ... "GET /favicon.ico HTTP/1.1" 404 133 81.107.39.38 - ... "GET /ply/bookplug.gif HTTP/1.1" 200 23903 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 81.107.39.38 - ... "GET /ply/example.html HTTP/1.1" 200 2359 66.249.72.134 - ... "GET /index.html HTTP/1.1" 200 4447 Oh yeah, and the log file might be huge (Gbytes)
  1005. Copyright (C) 2008, http://www.dabeaz.com 8- The Log File • Each

    line of the log looks like this: 12 bytestr = line.rsplit(None,1)[1] 81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238 • The number of bytes is the last column • It's either a number or a missing value (-) 81.107.39.38 - ... "GET /ply/ HTTP/1.1" 304 - • Converting the value if bytestr != '-': bytes = int(bytestr)
  1006. Copyright (C) 2008, http://www.dabeaz.com 8- A Non-Generator Soln • Just

    do a simple for-loop 13 wwwlog = open("access-log") total = 0 for line in wwwlog: bytestr = line.rsplit(None,1)[1] if bytestr != '-': total += int(bytestr) print "Total", total • We read line-by-line and just update a sum
  1007. Copyright (C) 2008, http://www.dabeaz.com 8- A Generator Solution • Let's

    use some generator expressions 14 wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes) • Well, that's certainly different • Less code • A completely different programming style
  1008. Copyright (C) 2008, http://www.dabeaz.com 8- Generators as a Pipeline •

    The solution is setting up a pipeline 15 wwwlog bytecolumn bytes sum() access-log total • Each step is defined by iteration/generation wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes)
  1009. Copyright (C) 2008, http://www.dabeaz.com 8- Being Declarative • At each

    step of the pipeline, we declare an operation that will be applied to the entire input stream 16 wwwlog bytecolumn bytes sum() access-log total bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) This operation gets applied to every line of the log file
  1010. Copyright (C) 2008, http://www.dabeaz.com 8- Being Declarative • Instead of

    focusing on the problem at a line-by-line level, you just break it down into big operations that operate on the whole file • This is very much a "declarative" style • The key : Think big... 17
  1011. Copyright (C) 2008, http://www.dabeaz.com 8- Iteration is the Glue 18

    • The glue that holds the pipeline together is the iteration that occurs in each step wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes) • The calculation is being driven by the last step • The sum() function is consuming values being pushed through the pipeline (via .next() calls)
  1012. Copyright (C) 2008, http://www.dabeaz.com 8- Performance • Surely, this generator

    approach has all sorts of fancy-dancy magic that is slow. • Let's check it out on a 1Gb log file... 19
  1013. Copyright (C) 2008, http://www.dabeaz.com 8- Performance Contest 20 wwwlog =

    open("access-log") total = 0 for line in wwwlog: bytestr = line.rsplit(None,1)[1] if bytestr != '-': total += int(bytestr) print "Total", total wwwlog = open("access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) bytes = (int(x) for x in bytecolumn if x != '-') print "Total", sum(bytes) 21.19s 20.14s Time Time
  1014. Copyright (C) 2008, http://www.dabeaz.com 8- Programming Problem 21 Write a

    program that can easily extract metadata from Firefox browser cache files • You were suppose to do this in the assignment • Probably encountered a variety of nasty bits of code with that
  1015. Copyright (C) 2008, http://www.dabeaz.com 8- The Firefox Cache • There

    are four critical files 22 _CACHE_MAP_ # Cache index _CACHE_001_ # Cache data _CACHE_002_ # Cache data _CACHE_003_ # Cache data • All files are binary-encoded • _CACHE_MAP_ is the index, but it is encoded in a tricky way. • Don't need it to extract URL requests anyways
  1016. Copyright (C) 2008, http://www.dabeaz.com 8- Firefox _CACHE_ Files • _CACHE_00n_

    file organization 23 Free/used block bitmap Blocks 4096 bytes Up to 32768 blocks • The block size varies according to the file: _CACHE_001_ 256 byte blocks _CACHE_002_ 1024 byte blocks _CACHE_003_ 4096 byte blocks
  1017. Copyright (C) 2008, http://www.dabeaz.com 8- Cache Metadata • Metadata is

    encoded as a binary structure 24 Header Request String Request Info 36 bytes Variable length (in header) Variable length (in header) • Header encoding (binary, big-endian) magic location fetchcount fetchtime modifytime expiretime datasize requestsize infosize unsigned int (0x00010008) unsigned int unsigned int unsigned int (system time) unsigned int (system time) unsigned int (system time) unsigned int (byte count) unsigned int (byte count) unsigned int (byte count) 0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31 32-35
  1018. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Headers 25 import os,

    struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close()
  1019. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Headers 26 import os,

    struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close() We loop over each _CACHE_00N_ file one by one. Open each file and skip the 4096 byte block bit-map at the beginning
  1020. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Headers 27 import os,

    struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close() We read the file and look for metadata headers (look for the magic bytes)
  1021. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Headers 28 import os,

    struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close() Yield the file handle and header fields that we read.
  1022. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Headers 29 import os,

    struct cachefiles = [('_CACHE_001_',256),('_CACHE_002_',1024), ('_CACHE_003_',4096)] def generate_headers(cachedir): for name, blocksize in cachefiles: pathname = os.path.join(cachedir,name) f = open(pathname,"rb") f.seek(4096) while True: header = f.read(36) if not header: break fields = struct.unpack(">9I",header) if fields[0] == 0x00010008: yield f, fields fp = f.tell() offset = fp % blocksize if offset: f.seek(blocksize - offset,1) f.close() Skip to the start of the next block (we look at the current file pointer to compute a skip value)
  1023. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Headers • Example of

    using the previous function 30 for f, header in generate_headers("FFCache"): print f, header • Example output... <open file 'FFCache/_CACHE_001_'> (65544, 0, 1, 1192968652, 119 <open file 'FFCache/_CACHE_001_'> (65544, 0, 1, 1192970452, 119 <open file 'FFCache/_CACHE_001_'> (65544, 0, 1, 1192972252, 119 ... <open file 'FFCache/_CACHE_002_'> (65544, 2701132042L, 2, 11928 <open file 'FFCache/_CACHE_002_'> (65544, 0, 1, 1192892697, 119 <open file 'FFCache/_CACHE_002_'> (65544, 0, 1, 1192892704, 119
  1024. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Metadata • Let's add

    to the processing pipeline 31 def generate_meta(headers): for f, header in headers: urlstr = f.read(header[7]) infostr = f.read(header[8]) yield header, urlstr, infostr • Example: headers = generate_headers("FFCache") meta = generate_meta(headers) for header, url, info in meta: print header print url print info print
  1025. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Metadata • Example output

    32 (65544, 0, 1, 1192892514, 1192892514, 4294967295L, 0, 92, 292) HTTP:http://en-us.start.mozilla.com/firefox?client=firefox-a&rl request-method^@GET^@response-head^@HTTP/1.1 301 Moved Permanen Location: http://www.google.com/firefox?client=firefox-a&rls=or Content-Type: text/html Server: gws Content-Encoding: gzip Date: Sat, 20 Oct 2007 15:01:56 GMT Cache-Control: private, x-gzip-ok="" ^@ (65544, 0, 1, 1192892514, 1192892514, 0, 1524, 83, 225) HTTP:http://www.google.com/firefox?client=firefox-a&rls=org.moz request-method^@GET^@response-head^@HTTP/1.1 200 OK Cache-Control: private Content-Type: text/html; charset=UTF-8 Content-Encoding: gzip Server: gws Content-Length: 1524
  1026. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Requests • Extract URLs

    and fetchtime 33 def generate_requests(meta): for header, rawurl, info in meta: fetchtime = header[3] url = rawurl.split(":",1)[1].strip('\x00') yield url, fetchtime • Example: headers = generate_headers("FFCache") meta = generate_meta(headers) requests = generate_requests(meta) for url, ftime in requests: print url print time.ctime(ftime) print
  1027. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Requests • Sample output

    34 http://www.google.com/images/firefox/grgrad.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/clear.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/title.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/fox1.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/fox2.gif Sat Oct 20 10:01:54 2007 http://www.google.com/images/firefox/google.gif Sat Oct 20 10:01:54 2007
  1028. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Domains • Generate domain

    names from requests 35 def generate_domains(requests): for url, fetchtime in requests: proto,request = url.split("://",1) domain = request.split("/",1)[0] yield domain • Example: headers = generate_headers("FFCache") meta = generate_meta(headers) requests = generate_requests(meta) domains = generate_domains(requests) for d in sorted(set(domains)): print d
  1029. Copyright (C) 2008, http://www.dabeaz.com 8- Generate Domains • Example output:

    36 ad.doubleclick.net ads.cnn.com ads.pointroll.com ads.yimg.com ak.bluestreak.com altfarm.mediaplex.com ar.atwola.com buttons.blogger.com d.yimg.com edge.jobthread.com edge.quantserve.com en-us.start.mozilla.com en-us.www.mozilla.com en.wikipedia.org games.slashdot.org gdyn.cnn.com ...
  1030. Copyright (C) 2008, http://www.dabeaz.com 8- Commentary • This whole solution

    is just one big processing pipeline 37 generate_headers generate_meta generate_requests generate_domains
  1031. Copyright (C) 2008, http://www.dabeaz.com 8- Another Example • Concatenate generated

    sequences together 38 def concatenate(seq): for s in seq: for item in s: yield s • Example: Find all domains in all caches all_caches = (path for path,dirlist,filelist in os.walk("/") if '_CACHE_MAP_' in filelist) headers = concatenate(generate_headers(path) for path in all_caches) meta = generate_meta(headers) requests = generate_requests(meta) domains = generate_domains(requests)
  1032. Copyright (C) 2008, http://www.dabeaz.com 8- Programming Problem • Dave's hedge

    fund 39 INSANE MONEY w/ GUIDO PY 142.34 (+8.12) JV 34.23 (-4.23) CPP 4.10 (-1.34) NET 14.12 (-0.50) After watching 87 straight hours of "Guido's Insane Money" on his Tivo, Dave has decided to quit his day job as a jazz musician and start a hedge fund. • Problem : Write a program that can process infinite streams of real-time stock market data
  1033. Copyright (C) 2008, http://www.dabeaz.com 8- A Log File • Suppose

    a sequence of real-time data is being written to a log (stocklog) 40 unix % tail -f stocklog.dat "MCD",50.80,"6/11/2007","09:30.00",-0.61,51.47,50.80,50.80,92400 "KO",51.63,"6/11/2007","09:30.00",-0.04,51.67,51.63,51.63,395215 "MMM",85.75,"6/11/2007","09:30.00",-0.19,85.94,85.75,85.75,15610 "JNJ",62.08,"6/11/2007","09:30.00",-0.05,62.89,62.08,62.08,25340 "AXP",62.39,"6/11/2007","09:30.01",-0.65,62.79,62.39,62.38,83462 ... • Again let's use generators...
  1034. Copyright (C) 2008, http://www.dabeaz.com 8- Tailing a File • A

    Python version of 'tail -f' 41 import time def follow(thefile): thefile.seek(0,2) # Go to the end of the file while True: line = thefile.readline() if not line: time.sleep(0.1) # Sleep briefly continue yield line • Idea : Seek to the end of the file and repeatedly try to read new lines. If new data is written to the file, we'll pick it up.
  1035. Copyright (C) 2008, http://www.dabeaz.com 8- Example • Using our follow

    function 42 stocklog = open("stocklog.dat","r") lines = follow(stocklog) ... for line in lines: print line, • This produces the same output as 'tail -f'
  1036. Copyright (C) 2008, http://www.dabeaz.com 8- Splitting into Fields • The

    lines of stock data are CSV formatted • Let's route them through the CSV module 43 import csv stocklog = open("stocklog.dat","r") lines = follow(stocklog) fields = csv.reader(lines) for f in fields: print f
  1037. Copyright (C) 2008, http://www.dabeaz.com 8- Splitting into Fields • Example

    output: 44 ['MCD', '51.28', '6/11/2007', '09:37.40', '-0.13', '51.47', '51.28 ['C', '53.11', '6/11/2007', '09:37.41', '-0.22', '53.20', '53.11', ['VZ', '43.00', '6/11/2007', '09:37.41', '-0.07', '42.95', '43.00' ['WMT', '49.75', '6/11/2007', '09:37.41', '-0.33', '49.90', '49.87 ['MRK', '50.37', '6/11/2007', '09:37.41', '0.23', '50.30', '50.37' ['PFE', '26.43', '6/11/2007', '09:37.41', '-0.09', '26.50', '26.43 ['AXP', '62.64', '6/11/2007', '09:37.42', '-0.40', '62.79', '62.64 • An infinite sequence of lists • Each list is a list of strings
  1038. Copyright (C) 2008, http://www.dabeaz.com 8- Field Conversion • Let's write

    a function that converts a list of strings into a list of converted values 45 def convert_fields(types,fields): return [ty(val) for ty,val in zip(types,fields)] • Example: fields = ['IBM', '103.23', '6/11/2007', '09:43.46', '0.16', '102.87', '103.23', '102.77', '345196'] types = [ str, float, str, str, float, float, float, float, int] cfields = convert_fields(types,fields) # fields = [ 'IBM', 103.23, '6/11/2007', '09:43.46', # 0.16, 102.87, 103.23, 102.77, 345196 ]
  1039. Copyright (C) 2008, http://www.dabeaz.com 8- Field Conversion • Let's add

    that to our processing pipeline 46 import csv stocklog = open("stocklog.dat","r") lines = follow(stocklog) fields = csv.reader(lines) fieldtypes = [str,float,str,str, float,float,float,float,int] converted = (convert_fields(fieldtypes,f) for f in fields) for s in converted: print s • This now produces lists of converted values
  1040. Copyright (C) 2008, http://www.dabeaz.com 8- Field Conversion • Example output:

    47 ['MCD', 51.28, '6/11/2007', '09:37.40', -0.13, 51.47, 51.28, 50.80 ['C', 53.11, '6/11/2007', '09:37.41', -0.22, 53.20, 53.11, 52.99', ['VZ', 43.00, '6/11/2007', '09:37.41', -0.07, 42.95, 43.00, 42.78' ['WMT', 49.75, '6/11/2007', '09:37.41', -0.33, 49.90, 49.87, 49.75 ['MRK', 50.37, '6/11/2007', '09:37.41', 0.23, 50.30, 50.37, 49.66' ['PFE', 26.43, '6/11/2007', '09:37.41', -0.09, 26.50, 26.43, 26.31 ['AXP', 62.64, '6/11/2007', '09:37.42', -0.40, 62.79, 62.64, 62.38
  1041. Copyright (C) 2008, http://www.dabeaz.com 8- Making Dictionaries • Let's put

    all of the fields into a dictionary 48 import csv stocklog = open("stocklog.dat","r") lines = follow(stocklog) fields = csv.reader(lines) fieldtypes = [str,float,str,str, float,float,float,float,int] converted = (convert_fields(fieldtypes,f) for f in fields) fieldnames = ['name','price','date','time', 'change','open','high','low','volume'] stocks = (dict(zip(fieldnames,c)) for c in converted)
  1042. Copyright (C) 2008, http://www.dabeaz.com 8- Making Dictionaries • Example output:

    49 {'volume': 584485, 'name': 'IBM', 'price': 103.56, 'high': 103.59999999999999, 'low': 102.77, 'time': '09:57.57', 'date': '6/11/2007', 'open': 102.87, 'change': 0.48999999999999999} {'volume': 441703, 'name': 'CAT', 'price': 78.739999999999995, 'high': 78.879999999999995, 'low': 77.989999999999995, 'time': '09:57.59', 'date': '6/11/2007', 'open': 78.319999999999993, 'change': 0.22} {'volume': 372369, 'name': 'DD', 'price': 51.130000000000003, 'high': 51.18, 'low': 50.600000000000001, 'time': '09:58.01', 'date': '6/11/2007', 'open': 51.130000000000003, 'change': 0.0}
  1043. Copyright (C) 2008, http://www.dabeaz.com 8- Interlude • We started with

    an infinite input source • We then routed lines from that source through a processing pipeline that produces an infinite sequence of dictionaries 50 follow csv.reader convert makedict stocklog.dat { } lines lines lists lists dicts
  1044. Copyright (C) 2008, http://www.dabeaz.com 8- Using the Results • Dave

    has a portfolio of stocks 51 portfolio = set(['IBM','MSFT','HPQ','CAT','AA']) • Write a program that prints out a real-time ticker showing the name, price, change, and volume for just these stocks in_portfolio = (s for s in stocks if s['name'] in portfolio) ticker = ((s['name'],s['price'],s['change'],s['volume']) for s in in_portfolio) for t in ticker: print "%10s %10.2f %10.2f %10d" % t
  1045. Copyright (C) 2008, http://www.dabeaz.com 8- Example: • Only print ticker

    data for negative change 52 in_portfolio = (s for s in stocks if s['name'] in portfolio) ticker = ((s['name'],s['price'],s['change'],s['volume']) for s in in_portfolio) negticker = (t for t in ticker if t[2] < 0) for t in negticker: print "%10s %10.2f %10.2f %10d" % t
  1046. Copyright (C) 2008, http://www.dabeaz.com 8- Commentary • That's probably enough

    with generators • Concept of a data processing pipeline is pretty powerful if you know how to apply it • Head explosion? 53
  1047. Copyright (C) 2008, http://www.dabeaz.com 8- Overview • Dynamic languages are

    used heavily in network programming applications • Processing different file formats • Interacting with web servers • Implementing network servers, etc. 55
  1048. Copyright (C) 2008, http://www.dabeaz.com 8- Example : Web Server 56

    • This is a complete Python web-server with support for CGI scripting from BaseHTTPServer import HTTPServer from CGIHTTPServer import CGIHTTPRequestHandler import os os.chdir("/home/docs/html") serv = HTTPServer(("",8080),CGIHTTPRequestHandler) serv.serve_forever() • Serves HTML files and executes scripts in "/cgi-bin" and "/htbin" directories
  1049. Copyright (C) 2008, http://www.dabeaz.com 8- Basic Principles • Using a

    lot of these network features is mostly a matter of reading the manual • There are various libraries and frameworks • Instead of talking about that, will cover absolute basics of network programming • Material most good programmers should just know about 57
  1050. Copyright (C) 2008, http://www.dabeaz.com 8- The Problem • Two main

    issues • Addressing - locating computers and services • Data transport - moving bits around 59
  1051. Copyright (C) 2008, http://www.dabeaz.com 8- Network Addressing • Computers on

    network have a hostname • Hostname mapped to numerical address (e.g., IP address, DNS) 60 Network foo.bar.com 205.172.13.4 www.python.org 82.94.237.218
  1052. Copyright (C) 2008, http://www.dabeaz.com 8- Ports • Connections are made

    between "ports" • Ports are bound to running processes/services 61 foo.bar.com 205.172.13.4 web email IM Port 80 Port 25 Port 31337 browser sendmail Port 7823 Port 3342
  1053. Copyright (C) 2008, http://www.dabeaz.com 8- Connections • A network connection

    involves connecting to a host address and port • Expressed as a pair (address,port) • Examples: 62 ("www.python.org",80) ("205.172.13.4",443)
  1054. Copyright (C) 2008, http://www.dabeaz.com 8- Client/Server Concept • Servers wait

    for incoming connections and provide some kind of service (e.g., web) • Clients make connections to servers 63 www.bar.com 205.172.13.4 web Port 80 browser Client Server • To make it work, servers use standardized port numbers (e.g., web server always on port 80)
  1055. Copyright (C) 2008, http://www.dabeaz.com 8- Standard Ports • Some commonly

    used port assignments 64 21 FTP 22 SSH 23 Telnet 25 SMTP (Mail) 80 HTTP (Web) 110 POP3 (Mail) 119 NNTP (News) 443 HTTPS (web) • Ports 1-1023 reserved by system (priviledged) • Ports 1024-65535 available to all
  1056. Copyright (C) 2008, http://www.dabeaz.com 8- Request/Response Cycle • Most network

    application use a request/ response programming model • Client sends a request (e.g., HTTP) 65 GET /index.html HTTP/1.0 • Server sends a response (e.g., HTTP) HTTP/1.0 200 OK Content-type: text/html Content-length: 48823 <HTML> ... • Actual protocol depends on the application
  1057. Copyright (C) 2008, http://www.dabeaz.com 8- Sockets • Programming abstraction for

    network code • Socket: A communication endpoint 66 socket socket • Supported by socket library module • Allows data to be written/read (e.g., like a file) network
  1058. Copyright (C) 2008, http://www.dabeaz.com 8- Socket Basics • Address families

    import socket s = socket.socket(addr_family, type) • Example: 67 • To create a socket socket.AF_INET Internet protocol (IPv4) socket.AF_INET6 Internet protocol (IPv6) • Socket types socket.SOCK_STREAM Connection based stream (TCP) socket.SOCK_DGRAM Datagrams (UDP) >>> from socket import * >>> s = socket(AF_INET,SOCK_STREAM)
  1059. Copyright (C) 2008, http://www.dabeaz.com 8- Socket Types • Internet Protocol

    • Most common case: TCP connection s = socket(AF_INET, SOCK_STREAM) s = socket(AF_INET, SOCK_DGRAM) 68 • Almost all code will use one of following s = socket(AF_INET, SOCK_STREAM)
  1060. Copyright (C) 2008, http://www.dabeaz.com 8- Using a Socket • Creating

    a socket is only the first step 69 s = socket(AF_INET, SOCK_STREAM) • Further use depends on application • Server • Listen for incoming connections • Client • Make an outgoing connection
  1061. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Connections 70 • Computers

    establish a dedicated connection • Bi-directional data transfer • Continuous I/O stream (like a file, pipe, etc.) • Reliable • Connection stays open until explicitly closed DATA TCP/IP socket(AF_INET,SOCK_STREAM)
  1062. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Clients • Using a

    socket to make a connection from socket import * s = socket(AF_INET,SOCK_STREAM) s.connect(("www.python.org",80)) s.send("GET /index.html HTTP/1.0\n\n") data = s.recv(10000) s.close() 71 • s.connect(addr) makes a connection s.connect(("www.python.org",80)) • Once connected, use send(),recv() to transmit and receive data • close() shuts down the connection
  1063. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • A simple

    server 72 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() • Send a message back to a client % telnet localhost 9000 Connected to localhost. Escape character is '^]'. Hello 127.0.0.1 Connection closed by foreign host. % Server message
  1064. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Address binding

    73 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() • Addressing s.bind(("",9000)) s.bind(("localhost",9000)) s.bind(("192.168.2.1",9000)) s.bind(("104.21.4.2",9000)) binds the socket to a specific address If system has multiple IP addresses, can bind to a specific address binds to localhost
  1065. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Start listening

    for connections 74 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() • s.listen(backlog) • backlog is # of pending connections to allow • Note: not related to number of clients Tells system to start listening for connections on the socket
  1066. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Accepting a

    new connection 75 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() • s.accept() blocks until connection received • Server sleeps if nothing is happening Accept a new client connection
  1067. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Client socket

    and address 76 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() Accept returns a pair (client_socket,addr) ("104.23.11.4",27743) <socket._socketobject object at 0x3be30> This is the network/port address of the client that connected This is a new socket that's used for data
  1068. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Sending data

    77 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() Send data to client Note: Using the client socket, not the server socket
  1069. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Closing the

    connection 78 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() Close client connection • Note: Server can keep client connection alive as long as it wants • Can repeatedly receive/send data
  1070. Copyright (C) 2008, http://www.dabeaz.com 8- TCP Server • Waiting for

    the next connection 79 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() print "Received connection from", a c.send("Hello %s\n" % a[0]) c.close() Wait for next connection • Original server socket is reused for further connections • Server runs forever
  1071. Copyright (C) 2008, http://www.dabeaz.com 8- An Example • A Stock

    Price Server • Suppose there is a dictionary with prices 80 prices = { } for line in open("prices.dat"): fields = line.split(",") prices[fields[0]] = float(fields[1]) >>> prices['IBM'] 102.86 >>> prices['AA'] 39.48 >>> • Turn this into a server where clients can connect and get the prices
  1072. Copyright (C) 2008, http://www.dabeaz.com 8- Solution 81 import socket s

    = socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() for name in prices: c.sendall("%s,%0.2f\n" % (name, prices[name])) c.close() • To test: % telnet localhost 9000 AXP,62.58 BA,98.31 DD,50.75 CAT,78.29 AIG,71.38 ...
  1073. Copyright (C) 2008, http://www.dabeaz.com 8- An Example • Modify the

    last example so specific prices can be requested and returned • Allow a list of stock names to be sent 82 % telnet localhost 9000 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. IBM AA CAT <newline> CAT,78.29 IBM,102.86 AA,39.48 Connection closed by foreign host. %
  1074. Copyright (C) 2008, http://www.dabeaz.com 8- Solution 83 import socket s

    = socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() f = c.makefile() nameline = f.readline() nameset = nameline.split() for name in prices: if not nameset or name in nameset: c.sendall("%s,%0.2f\n" % (name, prices[name])) f.close() c.close()
  1075. Copyright (C) 2008, http://www.dabeaz.com 8- UDP Networking 84 • Data

    sent in discrete packets (Datagrams) • No concept of a "connection" • No reliability, no ordering of data • Datagrams may be lost, arrive in any order • Higher performance (used in games, etc.) DATA DATA DATA socket(AF_INET,SOCK_DGRAM)
  1076. Copyright (C) 2008, http://www.dabeaz.com 8- UDP Client • Sending a

    datagram to a server 85 from socket import * s = socket(AF_INET,SOCK_DGRAM) s.sendto(msg,("server.com",10000)) data, addr = s.recvfrom(maxsize) Create datagram socket • Key concept: No "connection" • You just send a data packet Send a message Wait for a response returned data remote address
  1077. Copyright (C) 2008, http://www.dabeaz.com 8- UDP Server • A simple

    datagram server 86 from socket import * s = socket(AF_INET,SOCK_DGRAM) s.bind(("",10000)) while True: data, addr = s.recvfrom(maxsize) # Do something ... s.sendto(resp,addr) Create datagram socket • Much simpler than a TCP server • Again: No "connection" is established Bind to a specific port Wait for a message Send response
  1078. Copyright (C) 2008, http://www.dabeaz.com 8- An Example • Create a

    UDP stock price server • Server receives "names" • Responds with the price 87
  1079. Copyright (C) 2008, http://www.dabeaz.com 8- Solution 88 import socket s

    = socket.socket(socket.AF_INET,socket.SOCK_DGRAM) s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) s.bind(("",10000)) while True: name,addr = s.recvfrom(16) price = prices.get(name,0.0) s.sendto("%0.2f" % price,addr)
  1080. Copyright (C) 2008, http://www.dabeaz.com 8- Solution (UDP Client) 89 import

    socket price_socket = socket.socket(socket.AF_INET,socket.SOCK_DGRAM) price_server = ("",10000) def get_price(name): price_socket.sendto(name,price_server) price,addr = price_socket.recvfrom(32) return float(price) • Example use: >>> get_price('IBM') 102.86 >>> get_price('AA') 39.479999999999997 >>>
  1081. Copyright (C) 2008, http://www.dabeaz.com 8- Commentary • Sockets are the

    lowest level of network programming • If you know what you are doing, you can use sockets to write programs that interact with any other program on the network • Of course, the low-level details might be really hairy 90
  1082. Copyright (C) 2008, http://www.dabeaz.com 8- Network Protocols • There are

    a fairly standard set of common network protocols • HTTP (Web) • FTP • SMTP (email) • etc... 92
  1083. Copyright (C) 2008, http://www.dabeaz.com 8- Network Libraries • Most dynamic

    languages already have built-in library modules for common protocols • Will give a few examples... 93
  1084. Copyright (C) 2008, http://www.dabeaz.com 8- urllib Module • Open a

    web page: urlopen() 94 >>> import urllib >>> u = urllib.urlopen("http://www.python/org/index.html") >>> data = u.read() >>> print data <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML ... ... >>> • urlopen() returns a file-like object • Can use standard file operations on it
  1085. Copyright (C) 2008, http://www.dabeaz.com 8- Web Servers • Suppose you

    wanted to implement a completely customized HTTP server • A lot of web frameworks have an option to run "stand-alone" • Let's see how that works... 95
  1086. Copyright (C) 2008, http://www.dabeaz.com 8- HTTP Protocol • Clients send

    a request GET /index.html HTTP/1.1 Host: www.python.org User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; Accept: text/xml,application/xml,application/xhtml+xml,text/h Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive <blank line> • Request line followed by headers • Terminated by a blank line
  1087. Copyright (C) 2008, http://www.dabeaz.com 8- HTTP Protocol • Server sends

    a response HTTP/1.1 200 OK Date: Thu, 26 Apr 2007 19:54:01 GMT Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_py Last-Modified: Thu, 26 Apr 2007 18:40:24 GMT ETag: "61b82-37eb-5a0eb600" Accept-Ranges: bytes Content-Length: 14315 Connection: close Content-Type: text/html <HTML> ... • Response line followed by headers • Blank line followed by data
  1088. Copyright (C) 2008, http://www.dabeaz.com 8- HTTP Protocol • There are

    a small number of request types GET POST HEAD PUT • This isn't an exhaustive tutorial • There are standardized response codes 200 OK 403 Forbidden 404 Not Found 501 Not implemented ...
  1089. Copyright (C) 2008, http://www.dabeaz.com 8- Customized HTTP • Can define

    a custom class... 99 from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer class MyHandler(BaseHTTPRequestHandler): def do_GET(self): ... def do_POST(self): ... def do_HEAD(self): ... def do_PUT(self): ... serv = HTTPServer(("",8080),MyHandler) serv.serve_forever() Redefine the behavior of the server by defining code for all of the standard HTTP request types
  1090. Copyright (C) 2008, http://www.dabeaz.com 8- Customized HTTP • Example: Hello

    World 100 from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer class EchoHandler(BaseHTTPRequestHandler): def do_GET(self): if self.path == '/hello.html': self.send_response(200,"OK") self.send_header('Content-type','text/plain') self.end_headers() self.wfile.write("Hello World!\n") elif: self.send_response(404,"Not found") self.send_header('Content-type','text/plain') self.end_headers() self.wfile.write("I don't know.\n") serv = HTTPServer(("",8080),EchoHandler) serv.serve_forever()
  1091. Copyright (C) 2008, http://www.dabeaz.com 8- XML-RPC • Remote Procedure Call

    • Uses HTTP as a transport protocol • Parameters/Results encoded in XML • May be used in conjunction with AJAX (Asynchronous Javascript and XML) 101
  1092. Copyright (C) 2008, http://www.dabeaz.com 8- Simple XML-RPC • How to

    create a stand-alone server 102 from SimpleXMLRPCServer import SimpleXMLRPCServer def add(x,y): return x+y s = SimpleXMLRPCServer(("",8080)) s.register_function(add) s.serve_forever() • How to test it (xmlrpclib) >>> import xmlrpclib >>> s = xmlrpclib.ServerProxy("http://localhost:8080") >>> s.add(3,5) 8 >>> s.add("Hello","World") "HelloWorld" >>>
  1093. Copyright (C) 2008, http://www.dabeaz.com 8- Simple XML-RPC • Adding multiple

    functions 103 from SimpleXMLRPCServer import SimpleXMLRPCServer s = SimpleXMLRPCServer(("",8080)) s.register_function(add) s.register_function(foo) s.register_function(bar) s.serve_forever() • Registering an instance (exposes all methods) from SimpleXMLRPCServer import SimpleXMLRPCServer s = SimpleXMLRPCServer(("",8080)) obj = SomeObject() s.register_instance(obj) s.serve_forever()
  1094. Copyright (C) 2008, http://www.dabeaz.com 8- XML-RPC Commentary • XML-RPC is

    very easy to use • Can be be used generally for interprocess communication 104 XML-RPC XML-RPC XML-RPC
  1095. Copyright (C) 2008, http://www.dabeaz.com 8- Next time • Concurrent programming

    • Message passing/IPC • Programming with threads • More networking 106
  1096. Copyright (C) 2008, http://www.dabeaz.com 9- Principles of Dynamic Languages CSPP51060

    - Winter'08 University of Chicago David Beazley (http://www.dabeaz.com) 1
  1097. Copyright (C) 2008, http://www.dabeaz.com 9- Background 3 • You often

    have to write programs that perform multiple tasks in parallel • Example : Network servers that handle multiple client connections • Modern systems have multiple CPU cores • There is interest in writing programs that can take advantage of multiple cores to get better performance (parallel processing)
  1098. Copyright (C) 2008, http://www.dabeaz.com 9- Overview 4 • Will talk

    about some issues that come up with concurrent programming • Processes and IPC • Message Passing • Threads • Event-driven programming • Co-routines
  1099. Copyright (C) 2008, http://www.dabeaz.com 9- Disclaimer 5 • This is

    a delicate topic surrounded by tremendous peril • Could run an entire course just on this... • This is going to be more of a survey/intro
  1100. Copyright (C) 2008, http://www.dabeaz.com 9- Processes • In a previous

    class, we looked at how it is possible to create subprocesses 7 • Example: Setting up a pipe p = subprocess.Popen(['cmd'], stdin = subprocess.PIPE, stdout = subprocess.PIPE) p.stdin.write(data) # Send data p.stdin.close() # No more input result = p.stdout.read() # Read output python cmd p.stdout p.stdin stdin stdout
  1101. Copyright (C) 2008, http://www.dabeaz.com 9- Unix Process Fork • fork(),

    wait(), _exit() import os pid = os.fork() if pid == 0: # Child process ... os._exit(0) else: # Parent process ... # Wait for child os.wait(pid) python python fork() _exit() wait() concurrent execution 8
  1102. Copyright (C) 2008, http://www.dabeaz.com 9- Unix Process Fork • fork()

    creates an identical process • Newly created process is a "child process" • fork() returns different values in parent/child import os pid = os.fork() if pid == 0: # Child process else: # Parent process 9 pid is 0 in child, non-zero in parent • Parent and child run independently afterwards
  1103. Copyright (C) 2008, http://www.dabeaz.com 9- Concurrency and Networks • Many

    programmers find their way into concurrent programming by way of network programming • In order to handle multiple clients, servers must manage simultaneous network connections 10
  1104. Copyright (C) 2008, http://www.dabeaz.com 9- Sockets and Concurrency • Handling

    multiple clients 11 web Port 80 browser web web browser server clients
  1105. Copyright (C) 2008, http://www.dabeaz.com 9- Sockets and Concurrency • Each

    client has its own socket connection 12 web browser web web browser server clients # server code s = socket(AF_INET, SOCK_STREAM) ... while True: c,a = s.accept() ... a connection point for clients client data transmitted on a different socket
  1106. Copyright (C) 2008, http://www.dabeaz.com 9- Sockets and Concurrency • Connection

    process 13 web browser web web browser server clients Port 80 web browser connect accept() send()/recv()
  1107. Copyright (C) 2008, http://www.dabeaz.com 9- Sockets and Concurrency • To

    manage multiple clients, • Server must accept multiple connections and keep all connections alive • Must actively manage all client connections • Each client may be performing different tasks 14
  1108. Copyright (C) 2008, http://www.dabeaz.com 9- Forking Server (Unix) 15 import

    os from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) while True: c,a = s.accept() if os.fork() == 0: # Child process. Manage client ... c.close() os._exit(0) else: # Parent process. Clean up and go # back to wait for more connections c.close() • Each client is handled by a subprocess
  1109. Copyright (C) 2008, http://www.dabeaz.com 9- Forking Server • Server forks

    a new process to handle each client 16 Server listening Server Server Server Server fork() Client Client Client Client new clients
  1110. Copyright (C) 2008, http://www.dabeaz.com 9- It Gets Messy 17 •

    There are many ways to set up concurrent execution of clients • Each has various tradeoffs • There is often no one "right" way to do it
  1111. Copyright (C) 2008, http://www.dabeaz.com 9- Pre-forked Server (Unix) 18 s

    = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) nservers = 0 while True: # Spawn servers if nservers < maxservers: if os.fork() == 0: for i in xrange(maxrequests): c,a = s.accept() # Manage client ... c.close() os._exit(0) else: nservers += 1 else: os.wait() n.nservers -= 1 • Server creates copies of itself in advance • A popular approach used by Apache, etc. Each server runs in this simple loop
  1112. Copyright (C) 2008, http://www.dabeaz.com 9- Pre-forked Server • There is

    a process pool waiting for connections 19 Server listening Server Server Server Server Client new clients Client process pool
  1113. Copyright (C) 2008, http://www.dabeaz.com 9- Commentary • The details of

    subprocesses are covered in great detail in an Operating Systems coure • There are a few important details • Every process is independent • If multiple CPUs, more than one process can run simultaneously • Processes can exchange data with each other (Interprocess Communication) 20
  1114. Copyright (C) 2008, http://www.dabeaz.com 9- IPC and Concurrency • Can

    structure applications as a collection of co-operating processes that work together 22 process process process process • Each process runs independently, but sends/ receives data from other processes • Question : What are the communication options?
  1115. Copyright (C) 2008, http://www.dabeaz.com 9- IPC Options • IPC: Inter-Process

    Communication • Common Choices • Pipes • FIFOs • Sockets • Memory mapped files • Let's take a look... 23
  1116. Copyright (C) 2008, http://www.dabeaz.com 9- Pipes • An I/O channel

    between two processes 24 pipe p = subprocess.Popen(['process2'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) # Send data to subprocess p.stdin.write(data) # Receive data from subprocess result = p.stdout.read() Process 1 Process 2 • A pair of "files" hooked up to a subprocess
  1117. Copyright (C) 2008, http://www.dabeaz.com 9- Pipes • Pipes are more

    commonly used to collect the output of commands executed as subprocesses • However, a pipe can be left "open" indefinitely • With proper programming, can be used as a bi- directional communication channel for exchanging data • Terminology : This is known as a "co-process" 25
  1118. Copyright (C) 2008, http://www.dabeaz.com 9- FIFOs • Unix FIFO queue

    (named pipe) 26 • Example: # Creating a FIFO import os os.mkfifo("fifo_A",0666) # Reading from a FIFO f = open("fifo_A","r") data = f.read(nbytes) # Writing to a FIFO f = open("fifo_A","w",0) # Unbuffered I/O f.write(data) Process 1 Process 2 FIFO FIFO
  1119. Copyright (C) 2008, http://www.dabeaz.com 9- FIFOs • With care, can

    set up elaborate communications 27 Process 1 FIFO1 Process 2 Process 3 FIFO2 FIFO3 • Each process has own FIFO for messages • Any process can send to any other process
  1120. Copyright (C) 2008, http://www.dabeaz.com 9- FIFOs • Extreme peril :

    With FIFOs, multiple processes can send data to the same target 28 Process 1 Process 2 Process 3 FIFO3 • Will cause chaos on the receiver unless you figure out some way to coordinate it
  1121. Copyright (C) 2008, http://www.dabeaz.com 9- File Locking 29 • May

    control access to the channel via file system locking or some other approach # Each process opens a lock file import fcntl f = open("/tmp/fifo","w",0) # The FIFO g = open("/tmp/fifo.lock","w") # A lock # Critical section fcntl.flock(g.fileno(),fcntl.LOCK_EX) ... f.write("Some data\n") # Write on the FIFO ... fcntl.flock(g.fileno(),fcntl.LOCK_UN) • Example: Unix
  1122. Copyright (C) 2008, http://www.dabeaz.com 9- Interlude 30 • You're already

    starting to see problems with concurrency • Once there are multiple processes that access to shared resources, you often need to coordinate control and access • Locking and synchronization • Also : Almost none of this is "portable"
  1123. Copyright (C) 2008, http://www.dabeaz.com 9- Sockets • Interprocess communication via

    network layer 31 Process 1 Process 2 Process 3 socket socket socket • Basic idea: communication via TCP, UDP, etc. • We talked a bit about this last time
  1124. Copyright (C) 2008, http://www.dabeaz.com 9- TCP Clients • Using a

    socket to make a connection from socket import * s = socket(AF_INET,SOCK_STREAM) s.connect(("some.host.com",10000)) .. # Send/receive data s.send(request) response = s.recv(10000) ... # Done. Close the connection s.close() 32
  1125. Copyright (C) 2008, http://www.dabeaz.com 9- TCP Server • A simple

    server 33 from socket import * s = socket(AF_INET,SOCK_STREAM) s.bind(("",9000)) s.listen(5) c,a = s.accept() print "Received connection from", a # Send and receive messages request = c.recv(10000) c.send(response) ... # Done c.close()
  1126. Copyright (C) 2008, http://www.dabeaz.com 9- Commentary • Sockets most commonly

    thought of for "network programming" • Can be used as an IPC mechanism between processes running on the same machine • Networking via the loopback interface (127.0.0.1) 34
  1127. Copyright (C) 2008, http://www.dabeaz.com 9- Unix Domain Sockets • Using

    the socket API to create a "pipe" s = socket(AF_UNIX,SOCK_STREAM) s.bind("/tmp/foo") s.listen(5) c,a = s.accept() # Send/receive data request = c.recv(10000) c.send(response) ... c.close() 35 • Clients s = socket(AF_UNIX,SOCK_STREAM) s.connect("/tmp/foo") s.send(request) resp = s.recv(10000)
  1128. Copyright (C) 2008, http://www.dabeaz.com 9- Pipes versus Sockets • Depending

    on the system, pipes and FIFOs are often highly optimized in the operating system • Network layer often involves more processing steps and buffering • However, programming with pipes may be more difficult (especially FIFOs) 36
  1129. Copyright (C) 2008, http://www.dabeaz.com 9- Memory Mapped Files • Processes

    can share memory via mmap 37 Process 1 Process 2 • Idea here: processes share a mutable byte array • Changes immediately reflected in both processes • Highly optimized in the OS (no copying) memory mapped file array array
  1130. Copyright (C) 2008, http://www.dabeaz.com 9- Memory Mapped Files • Creating

    a memory mapped file 38 # Common code import mmap SIZE = 100000 # Number of bytes f = open("shared","w+b") f.seek(SIZE,0) # Expand file to desired size f.write("\n") # Now, memory map the file into an array m = mmap.mmap(f.fileno(),100000, mmap.ACCESS_WRITE) • This creates a shared byte array
  1131. Copyright (C) 2008, http://www.dabeaz.com 9- Memory Mapped Files • Using

    a memory mapped file 39 m = mmap.mmap(f.fileno(),100000, mmap.ACCESS_WRITE) • Extract data from the memory array data = m[start:stop] • Store data in the memory array m[start:stop] = data # Data must exactly fit • Key point: Modifications to the array instantly appear in all shared copies of the file. Memory is shared, there is no copying/buffering.
  1132. Copyright (C) 2008, http://www.dabeaz.com 9- Coordinating Access • Programming with

    memory mapped regions requires very careful coordination • Again, you may have to use file-locks 40 # Each process opens a lock file import fcntl f = open("shared","w+b") # The shared file # Critical section fcntl.flock(f.fileno(),fcntl.LOCK_EX) ... ... Some critical operation ... fcntl.flock(f.fileno(),fcntl.LOCK_UN)
  1133. Copyright (C) 2008, http://www.dabeaz.com 9- Portable IPC 41 • How

    to program with IPC • In practice, programs can be written in a manner where the actual IPC mechanism being used is hidden from application code. • Programming abstractions: • IPC via "files" • IPC via "messages"
  1134. Copyright (C) 2008, http://www.dabeaz.com 9- IPC via "Files" 42 •

    For pipes : You already get a pair of files p = subprocess.Popen(['cmd'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) in_f = p.stdin out_f = p.stdout • For sockets : Can wrap with a file-layer s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) ... in_f = s.makefile("w") out_f = s.makefile("r")
  1135. Copyright (C) 2008, http://www.dabeaz.com 9- IPC via "Files" 43 •

    With a file-API, you just read/write streams of characters • Processes communicate by interpreting the contents of the I/O stream. • The tricky part : There is no concept of "records" or "messages" • If more than one process write onto a single stream, then you have to figure out how to coordinate it and sort out the results
  1136. Copyright (C) 2008, http://www.dabeaz.com 9- IPC via "Messages" 44 •

    General idea : Encapsulate IPC into some sort of message-passing API ch = IPC_Channel(args) ch.send(msg) # Send a message msg = ch.receive() # Receive a message • Message passing is a long established concept • However, there are dozens (if not hundreds) of libraries related to doing it. • Each with their own slightly different API
  1137. Copyright (C) 2008, http://www.dabeaz.com 9- IPC via "Messages" 46 •

    With messages, processes send well-defined chunks of data to each other. • The absolute critical operations are • send() - Send a message somewhere • receive() - Wait for a message • Let's build a message passing library...
  1138. Copyright (C) 2008, http://www.dabeaz.com 9- Message Passing 47 • Basic

    idea: Define an I/O "Channel" class Channel(object): ... • Implement methods such as the following c.send(msg) # Send a message c.receive() # Receive a message • A message is just a string of bytes
  1139. Copyright (C) 2008, http://www.dabeaz.com 9- MP Example 48 • Suppose

    we want to implement message passing over a pair of file objects inf # File open for reading outf # File open for writing • Ex: Files might be from a pipe or FIFO
  1140. Copyright (C) 2008, http://www.dabeaz.com 9- Channels 49 • Start by

    defining a channel class class FileChannel(object): def __init__(self, inf, outf): self.inf = inf self.outf = outf def send(self,msg): pass def receive(self): pass
  1141. Copyright (C) 2008, http://www.dabeaz.com 9- Sending Data 50 • Implement

    code to send a message class FileChannel(object): ... def send(self,msg): self.outf.write("%d\n" % len(msg)) self.outf.write(msg) self.outf.flush() • In this case, length followed by data size msg • This approach is giving us a means for "framing" the data into records that can be easily understood by the receiver
  1142. Copyright (C) 2008, http://www.dabeaz.com 9- Receiving Data 51 • The

    opposite of sending class FileChannel(object): ... def receive(self): size_str = self.inf.readline() size = int(size_str) msg = self.inf.read(size) return msg • Note: Would probably want to add some more robust error handling (will skip for now)
  1143. Copyright (C) 2008, http://www.dabeaz.com 9- Example : Pipes 52 •

    An Echo Client (using pipes) # client.py import sys, channel ch = channel.FileChannel(sys.stdin,sys.stdout) while True: msg = ch.receive() ch.send("Client received: %s" % msg) • This just wraps a channel around stdin/stdout
  1144. Copyright (C) 2008, http://www.dabeaz.com 9- Example : Pipes 53 •

    Use >>> import popen2,channel >>> out,inp = popen2.popen2("python client.py") >>> ch = channel.FileChannel(out,inp) >>> ch.send("Hello") >>> ch.receive() 'Client received: Hello' >>> ch.send("World") >>> ch.receive() 'Client received: World' >>>
  1145. Copyright (C) 2008, http://www.dabeaz.com 9- Example : FIFOs 54 •

    An Echo Client # echofifo.py import os, channel os.mkfifo("/tmp/echo_in") os.mkfifo("/tmp/echo_out") echo_in = open("/tmp/echo_in","rb") echo_out = open("/tmp/echo_out","wb",0) ch = channel.FileChannel(echo_in,echo_out) while True: msg = ch.receive() ch.send("Client received: %s" % msg)
  1146. Copyright (C) 2008, http://www.dabeaz.com 9- Example : FIFOs 55 •

    Use >>> import channel >>> echo_in = open("/tmp/echo_in","wb",0) >>> echo_out = open("/tmp/echo_out","rb") >>> ch = channel.FileChannel(echo_out,echo_in) >>> ch.send("Hello") >>> ch.receive() 'Client received: Hello' >>> • Note : Order in which FIFOs are opened is critical here • Client must already be started
  1147. Copyright (C) 2008, http://www.dabeaz.com 9- Example : Sockets 56 •

    An Echo Client # echosock.py import socket, channel address = ("",10000) s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) s.bind(address) s.listen(1) c,a = s.accept() client_f = c.makefile() ch = channel.FileChannel(client_f,client_f) while True: msg = ch.receive() ch.send("Client received: %s" % msg)
  1148. Copyright (C) 2008, http://www.dabeaz.com 9- Example : Sockets 57 •

    Use >>> import channel >>> import socket >>> s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) >>> s.connect(("",10000) >>> client_f = s.makefile() >>> ch = channel.FileChannel(client_f,client_f) >>> ch.send("Hello") >>> ch.receive() 'Client received: Hello' >>> • Note: Echo client must already be running
  1149. Copyright (C) 2008, http://www.dabeaz.com 9- Why Message Passing? 58 •

    Message Passing is simple • Just a few basic primitives • send, receive • Can be used to build more advanced IPC programming abstractions • Remote procedure call • Distributed objects (e.g., CORBA, etc.)
  1150. Copyright (C) 2008, http://www.dabeaz.com 9- Why Message Passing? 59 •

    Easily reconfigured for different systems Process 1 Process 2 Process 3 pipe pipe pipe Process 1 Process 2 Process 3 pipe socket socket
  1151. Copyright (C) 2008, http://www.dabeaz.com 9- Why Message Passing? 60 •

    Long history of message passing • It's an established programming technique • Algorithms, properties, pitfalls are known • Scalable performance • Thousands of processors • Supercomputers
  1152. Copyright (C) 2008, http://www.dabeaz.com 9- Advanced MP 61 • Dynamic

    languages can be extremely powerful when mixed with message passing • Can build systems based on remote procedure call/distributed objects, etc. • Let's look at a simple example...
  1153. Copyright (C) 2008, http://www.dabeaz.com 9- Object Serialization 62 • Many

    languages allow objects to be "serialized" into strings • For example : pickle module in Python import pickle bytes = pickle.dumps(obj) # Turn obj into bytes obj = pickle.loads(bytes) # Turn bytes back to obj
  1154. Copyright (C) 2008, http://www.dabeaz.com 9- Object Serialization 63 • Can

    add an object serialization/unserialization step on each end of a communication channel Process 1 Process 2 serialize unserialize • Let's look at that a little further...
  1155. Copyright (C) 2008, http://www.dabeaz.com 9- Example 64 • A Pickle

    Channel import pickle class PickleChannel(object): def __init__(self,ch): self.ch = ch def send(self,obj): msg = pickle.dumps(obj) self.ch.send(msg) def receive(self): msg = self.ch.receive() return pickle.loads(obj) • This adds load/dump operations on to an existing channel
  1156. Copyright (C) 2008, http://www.dabeaz.com 9- Example 65 • A sample

    subprocess : An Adder # adder.py import sys, channel fch = channel.FileChannel(sys.stdin,sys.stdout) pch = channel.PickleChannel(fch) while True: x,y = pch.receive() pch.send(x+y) • Receive two objects as input • Adds and sends the result back
  1157. Copyright (C) 2008, http://www.dabeaz.com 9- Example 66 • Using the

    adder >>> import popen2, channel >>> p_out,p_in = popen2.popen2("python adder.py") >>> fch = channel.FileChannel(p_out,p_in) >>> pch = channel.PickleChannel(fch) >>> pch.send((3,4)) >>> pch.receive() 7 >>> pch.send(("hello","world")) >>> pch.receive() "helloworld" >>> • Notice how any object that supports (+) can be sent
  1158. Copyright (C) 2008, http://www.dabeaz.com 9- Remote Procedure Call 67 •

    Example is an example of remote procedure call • Have a subprocess/coprocess that implements such functionality • Another process sends data (parameters) and receives results • Can package this up in more exotic ways
  1159. Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 68 • Remote

    Procedure Call # rpcserver.py class RPCServer(object): def __init__(self,ch): self.ch = ch self.funcs = { } def register(self,name,func): self.funcs[name] = func def serve(self): while True: name,args,kwargs = self.ch.receive() result = self.funcs[name](*args,**kwargs) self.ch.send(result) • Note: Needs more error checking
  1160. Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 69 • RPC

    Sample Server # rpcex.py import sys, channel fch = channel.FileChannel(sys.stdin,sys.stdout) pch = channel.PickleChannel(fch) def add(x,y): return x+y def sub(x,y): return x-y import rpcserver serv = rpcserver.RPCServer(pch) serv.register("add",add) serv.register("sub",sub) serv.serve()
  1161. Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 70 • RPC

    Use >>> import popen2, channel >>> p_out,p_in = popen2.popen2("python rpcex.py") >>> fch = channel.FileChannel(out,inp) >>> pch = channel.PickleChannel(fch) >>> pch.send(("add",(3,4),{})) >>> pch.receive() 7 >>> pch.send(("sub",(3,4),{})) >>> pch.receive() -1 >>>
  1162. Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 71 • A

    slightly better client interface def do_rpc(ch,name,*args,**kwargs): ch.send((name,args,kwargs)) return ch.receive() • Example: >>> do_rpc(pch,"add",3,4) 7 >>> do_rpc(pch,"sub",6,2) 4 >>>
  1163. Copyright (C) 2008, http://www.dabeaz.com 9- Example: RPC 72 • An

    even better client interface def rpc_func(ch,name): def do_rpc(*args,**kwargs): ch.send((name,args,kwargs)) return ch.receive() return do_rpc • Example: >>> add = rpc_func(pch,"add") >>> sub = rpc_func(pch,"sub") >>> add(3,4) 7 >>> sub(3,4) -1 >>>
  1164. Copyright (C) 2008, http://www.dabeaz.com 9- Commentary 73 • You could

    go even further down this route • We'll leave it at that for now • Also need to think about error checking (a really wicked problem in itself)
  1165. Copyright (C) 2008, http://www.dabeaz.com 9- Message Summary 74 • Message

    passing is a very powerful technique for setting up concurrent programs • Easily adapted to different I/O schemes • Can be extended across the network • Quite portable if done right
  1166. Copyright (C) 2008, http://www.dabeaz.com 9- Concept: Threads • An independent

    task that runs inside of a process • Shares resources with the process (memory, files, etc.) • Has own flow of execution (stack, PC) 76
  1167. Copyright (C) 2008, http://www.dabeaz.com 9- Thread Basics 77 % python

    program.py Program launch. Python loads a program and starts executing statements statement statement ... "main thread"
  1168. Copyright (C) 2008, http://www.dabeaz.com 9- Thread Basics 78 % python

    program.py Creation of a thread. Launches a function. statement statement ... create thread(foo) def foo():
  1169. Copyright (C) 2008, http://www.dabeaz.com 9- Thread Basics 79 % python

    program.py Parallel execution of statements statement statement ... create thread(foo) def foo(): statement statement ... statement statement ...
  1170. Copyright (C) 2008, http://www.dabeaz.com 9- Thread Basics 80 % python

    program.py thread terminates on return or exit statement statement ... create thread(foo) def foo(): statement statement ... statement statement ... return or exit statement statement ...
  1171. Copyright (C) 2008, http://www.dabeaz.com 9- Thread Basics 81 % python

    program.py statement statement ... create thread(foo) def foo(): statement statement ... statement statement ... return or exit statement statement ... Key idea: Thread is like a little subprocess that runs inside your program thread
  1172. Copyright (C) 2008, http://www.dabeaz.com 9- Creating a Thread • To

    create a thread, you define a class import time import threading class CountdownThread(threading.Thread): def __init__(self,count): threading.Thread.__init__(self) self.count = count def run(self): while self.count > 0: print "Counting down", self.count self.count -= 1 time.sleep(5) return • Inherit from Thread and redefine run() 82
  1173. Copyright (C) 2008, http://www.dabeaz.com 9- threading module • To launch,

    create objects and use start() t1 = CountdownThread(10) # Create the thread object t1.start() # Launch the thread t2 = CountdownThread(20) # Create another thread t2.start() # Launch • Threads execute until run() method returns or exits 83
  1174. Copyright (C) 2008, http://www.dabeaz.com 9- Joining a Thread • It

    may be necessary to wait for a thread t.start() # Launch a thread ... # Do other work ... # Wait for thread to finish t.join() # Waits for thread t to exit • t.join([timeout]) • Can only be used by other threads (a thread can't join itself) 84
  1175. Copyright (C) 2008, http://www.dabeaz.com 9- Daemonic Threads • Creating a

    daemon thread (detached thread) t.setDaemon(True) • Daemon threads run forever • Like a background thread • Destroyed when the process exits • Can't be joined • Often used when creating worker/client threads 85
  1176. Copyright (C) 2008, http://www.dabeaz.com 9- Thread Synchronization • Threads may

    share common data • Extreme care if accessing shared data • One thread must not modify data while another thread is reading it • Otherwise, will get a "race condition" 86
  1177. Copyright (C) 2008, http://www.dabeaz.com 9- Race Condition • Consider a

    shared object x = 0 • And two threads Thread-1 -------- ... x = x + 1 ... Thread-2 -------- ... x = x - 1 ... • Possible that the attribute will be corrupted • If one thread modifies the value just after the other has read it. 87
  1178. Copyright (C) 2008, http://www.dabeaz.com 9- Race Condition • The two

    threads Thread-1 -------- ... x = x + 1 ... Thread-2 -------- ... x = x - 1 ... • Low level interpreter code Thread-1 -------- push(x) push(1) add x = pop Thread-2 -------- push(x) push(1) sub x = pop() context switch 88 reads a stale value overwrites update by Thread-2 context switch
  1179. Copyright (C) 2008, http://www.dabeaz.com 9- Race Condition • Is this

    a real concern or simply theoretical? >>> x = 0 >>> def foo(): ... global x ... for i in xrange(100000000): x += 1 ... >>> def bar(): ... global x ... for i in xrange(100000000): x -= 1 ... >>> t1 = threading.Thread(target=foo) >>> t2 = threading.Thread(target=bar) >>> t1.start(); t2.start() >>> t1.join(); t2.join() >>> x -834018 >>> 89 ???
  1180. Copyright (C) 2008, http://www.dabeaz.com 9- Mutex Locks • Mutual exclusion

    locks m = threading.Lock() # Create a lock m.acquire() # Acquire the lock m.release() # Release the lock • If another thread tries to acquire the lock, it blocks until the lock is released • Only one thread may hold the lock 90
  1181. Copyright (C) 2008, http://www.dabeaz.com 9- Use of Mutex Locks •

    Commonly placed around critical sections x = 0 x_lck = threading.Lock() def foo(): global x x_lck.acquire() x += 1 x_lck.release() def bar(): global x x_lck.acquire() x -= 1 x_lck.release() 91 Critical section Critical section
  1182. Copyright (C) 2008, http://www.dabeaz.com 9- Other Locking Primitives • Reentrant

    Mutex Lock m = threading.RLock() # Create a lock m.acquire() # Acquire the lock m.release() # Release the lock • Semaphores m = threading.Semaphore(n) # Create a semaphore m.acquire() # Acquire the lock m.release() # Release the lock • Lock based on a counter • Can be acquired multiple times by same thread • Won't cover in detail here 92
  1183. Copyright (C) 2008, http://www.dabeaz.com 9- Events • Use to communicate

    between threads e = threading.Event() e.isSet() # Return True if event set e.set() # Set event e.clear() # Clear event e.wait() # Wait for event • Common use Thread 1 -------- ... # Wait for an event e.wait() ... # Respond to event 93 Thread 2 -------- ... # Trigger an event e.set() notify
  1184. Copyright (C) 2008, http://www.dabeaz.com 9- Events/Multiple Threads • Events can

    work with multiple threads 94 Thread 1 e.wait() setting the event unblocks all waiting threads Thread 2 e.wait() Thread 3 e.wait() e = threading.Event() blocked Thread X e.set()
  1185. Copyright (C) 2008, http://www.dabeaz.com 9- Programming with Threads • Must

    define parts of program that can run concurrently (may depend on algorithm) • Must identify all shared data structures • Must protect critical sections with locks • Synchronize threads with events as needed • Must cross fingers and hope that it works 95
  1186. Copyright (C) 2008, http://www.dabeaz.com 9- Thread Pitfalls • Obscure race

    conditions (corner cases) • Deadlock (mismanagement of locks) 96 def foo(): lck.acquire() ... if expr: return ... lck.release() Oops. Forgot to release lock • More complicated development/debugging • Poor performance (excessive locking)
  1187. Copyright (C) 2008, http://www.dabeaz.com 9- Using Threads • If you

    must use threads, consider using the approach which causes the least amount of peril and pain • Independent threads that communicate via message queues 97 Thread 1 Thread 2 Queue
  1188. Copyright (C) 2008, http://www.dabeaz.com 9- Queue Example • Queue module

    in Python • Creating a Queue with maximum # elements import Queue q = Queue.Queue(maxsize) • To create an infinite Queue import Queue q = Queue.Queue() 98
  1189. Copyright (C) 2008, http://www.dabeaz.com 9- Queue Operations • To insert

    an item q.put(item) • Blocks until space in is available 99 • Removing items from a queue item = q.get() • If Queue empty, blocks for data to arrive
  1190. Copyright (C) 2008, http://www.dabeaz.com 9- Producer-Consumer in_q = Queue.Queue() ...

    def consume_items(): while True: item = in_q.get() # Consume the item ... • Producer threads • Consumer thread while True: # Produce an item ... # Send to the consumer in_q.put(item) 100
  1191. Copyright (C) 2008, http://www.dabeaz.com 9- Producer-Consumer class Consumer(threading.Thread): def __init__(self):

    threading.Thread.__init__(self) self.in_q = Queue.Queue() def send(self,item): self.in_q.put(item) def run(self): while True: item = self.in_q.get() # Process item ... • An alternative formulation is to structure consumers as objects you "send" items to 101 • This ties threads to "message passing"
  1192. Copyright (C) 2008, http://www.dabeaz.com 9- Producer-Consumer • Commentary on solution

    • No locks. Queue is thread-safe • No shared data. Producer/consumer only communicate via queue. • Strikingly similar to message passing • Code is simple 102
  1193. Copyright (C) 2008, http://www.dabeaz.com 9- Cost of Threads • Threads

    sometimes considered for applications where there is massive concurrency (e.g., server with thousands of clients) • However, threads are fairly expensive • Don't improve performance (context-switching) • Incur considerable memory overhead (each thread has its own C stack, etc.) 103
  1194. Copyright (C) 2008, http://www.dabeaz.com 9- Problems with Threads • Dynamic

    languages often make very poor use of threads • The interpreters themselves are often not thread-safe (or are locked down in some way) • Example : Global interpreter lock in Python • As a result, even if you use threads, programs won't run on more than one CPU 104
  1195. Copyright (C) 2008, http://www.dabeaz.com 9- Alternatives to Threads • Co-operating

    processes (better performance on multiple CPUs) • Event driven programming • Co-routines 105
  1196. Copyright (C) 2008, http://www.dabeaz.com 9- Event Driven Systems • Programs

    structured as an event loop 107 while True: event = get_event() if event.type == BUTTON_PRESS: do_button(event) elif event.type == MOUSE_MOVE: do_mousemove(event) elif event.type == KEYPRESS: do_keypress(event) elif event.type == FILE_INPUT: do_fileinput(event) elif event.type == NETWORK: do_network(event) ...
  1197. Copyright (C) 2008, http://www.dabeaz.com 9- Event Driven Systems • In

    event driven systems, programs get built as a collect of function/objects that react to different events • Classic example : GUIs • However, the same approach can be applied to networks, file I/O, etc. 108
  1198. Copyright (C) 2008, http://www.dabeaz.com 9- Example : A GUI Button

    • Make a button (using Tk) 109 >>> def response(): ... print "You did it!" ... >>> from Tkinter import Button >>> x = Button(None,text="Do it!",command=response) >>> x.pack() >>> x.mainloop() • Clicking on the button.... You did it! You did it! ...
  1199. Copyright (C) 2008, http://www.dabeaz.com 9- Co-routines • A technique sometimes

    used for implementing co-operative multitasking 110 def do_foo(): while True: # Various statements .... ... (yield) # Yield control to someone else ... • Basic idea : Functions run until they explicitly yield to some other function • Only one thing runs at once, but it gives the illusion of concurrency
  1200. Copyright (C) 2008, http://www.dabeaz.com 9- Coroutine Example • Python co-routine

    example 111 def countdown(n): while True: print "T-minus", n (yield) n = n - 1 • This is like a generator, but we're not actually generating any values >>> c = countdown(10) >>> c.next() T-minus 10 >>> c.next() T-minus 9 >>>
  1201. Copyright (C) 2008, http://www.dabeaz.com 9- Coroutine Example • Running multiple

    co-routines 112 c1 = countdown(20) c2 = countdown(10) procs = [c1,c2] while procs: for p in procs: try: p.next() except StopIteration: procs.remove(p) • This is an outer loop that is "scheduling" the different co-routines (round-robin)
  1202. Copyright (C) 2008, http://www.dabeaz.com 9- Coroutine Example • Example output

    113 T-minus 20 T-minus 10 T-minus 19 T-minus 9 T-minus 18 T-minus 8 T-minus 17 T-minus 7 T-minus 16 T-minus 6 T-minus 15 T-minus 5 T-minus 14
  1203. Copyright (C) 2008, http://www.dabeaz.com 9- Next class • March 20!

    • Project presentations and wrap-up • No class, March 13. 115