Upgrade to Pro — share decks privately, control downloads, hide ads and more …

New Features in Python 2

New Features in Python 2

Presentation. O'Reilly Open Source Conference, 2001.

David Beazley

July 26, 2001
Tweet

More Decks by David Beazley

Other Decks in Programming

Transcript

  1. New Features in Python 2 David M. Beazley Department of

    Computer Science University of Chicago [email protected] O'Reilly Open Source Conference July 26, 2001 <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 1 July 26, 2001, [email protected] >>>
  2. Introduction Python 2.0 (October 16, 2000) The first major Python

    release nearly two years (previous version: 1.5.2). Python 2.1 (April 17, 2001) A substantial upgrade from 2.0 Changes include Substantial additions to the core language New operators Changes to semantics and scoping rules New types Unicode support A variety of new library modules Many enhancements to the run-time environment <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 2 July 26, 2001, [email protected] >>>
  3. My Background Python Software development SWIG (Used to build C/C++

    extensions to Python,Perl,Tcl, etc.). PLY (Python Lex-Yacc) WAD (Wrapped Application Debugger). Have also used Python with a variety of scientific applications. Author Python Essential Reference, 2nd Ed. (New Riders Publishing) Educator Have used Python in variety of advanced computer science courses (networks, compilers) Lurker An avid reader of the Python development mailing list <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 3 July 26, 2001, [email protected] >>>
  4. Resources Excellent overview of new features "What's new in Python

    2.x" by A.M. Kuchling and Moshe Zadka Available at www.python.org Detailed list of changes Misc/NEWS file in the Python source distribution. PEPs (Python Enhancement Proposals) http://python.sourceforge.net/peps <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 4 July 26, 2001, [email protected] >>>
  5. Overview Prerequisites I assume that you know something about Python

    Approach I'm mostly going to discuss changes to the core language New features since Python 1.5.2 Will not cover changes to C API or low-level internals Outline Changes to the core language and runtime environment Unicode and internationalization Additions to the standard library Miscellaneous topics Please ask questions! <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 5 July 26, 2001, [email protected] >>>
  6. Augmented Assignment New operators x += y # x =

    x + y x -= y # x = x - y x *= y # x = x * y x /= y # x = x / y x <<= y # x = x << y x >>= y # x = x >> y x %= y # x = x % y x &= y # x = x & y x |= y # x = x | y x ^= y # x = x ^ y x **= y # x = x ** y Example i = 0 while i < 100: i += 1 <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 7 July 26, 2001, [email protected] >>>
  7. Augmented Assignment Comments Augmented assignment does not violate mutability or

    perform in-place modification s = "Hello" s += "World" # Creates a new string "HelloWorld" and binds to s # Does not modify contents of existing s Can be applied to any valid l-value s[i] += y # s[i] = s[i] + y s.x += y # s.x = s.x + y s.x[i][j] += y # s.x[i][j] = s.x[i][j] + y Still no x++ or x-- operators (unlikely in future). Overloading Can overload augmented assignment operators as special case __iadd__(self,other), __isub__(self,other), ... Allows in-place modification and special interfaces <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 8 July 26, 2001, [email protected] >>>
  8. Augmented Assignment Overloading Example import md5 # An alternative interface

    for making MD5 signatures class MD5: def __init__(self): self.sig = md5.new() def __str__(self): return self.sig.digest() def __iadd__(self,other): self.sig.update(other) return self >>> import sig >>> m = sig.MD5() >>> m += "Hello" >>> m += "World" >>> str(m) 'h\341\011\360\364\014\247*\025\340\\\302'\206\370\346' >>> m += "Python" >>> str(m) '&\321\025;4m\254\216\223\375\030\333\264\263;\250' <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 9 July 26, 2001, [email protected] >>>
  9. List Comprehensions List handling Many programs involve a significant amount

    of list manipulation. Common problem: constructing new lists from an existing list s = [ "3", "4", "10", "11", "12"] t = [ ] for n in s: t.append(int(n)) map(func,s) Previous versions provided map() function s = [ "3", "4", "10", "11", "12"] t = map(int,s) List comprehensions A convenient syntax for creating new lists from existing sequences s = [ "3", "4", "10", "11", "12"] t = [int(x) for x in s] <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 10 July 26, 2001, [email protected] >>>
  10. List Comprehensions General Syntax a = [ expr for x1

    in s1 for x2 in s2 ... for xn in sn if fexpr ] Expanded version a = [ ] for x1 in s1: for x2 in s2: ... for xn in sn: if fexpr: a.append(expr) Comments The for operations are nested (not in parallel) Syntax is a little mind-blowing at first However, easy to use once you figure it out. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 11 July 26, 2001, [email protected] >>>
  11. List Comprehensions Examples a = [ 1, 4, -10, 20,

    -2, -5, 8 ] b = [ 2*x for x in a ] # b = [2, 8, -20, 40, -4, -10, 16] c = [ x for x in a if x > 0 ] # c = [1, 4, 20, 8] d = [ float(x) for x in f.readlines() ] Creating tuples d = [ (x,y) for x in a for y in b ] The (x,y) syntax is required. In contrast to tuples without parentheses. # Syntax error d = [ x,y for x in a for y in b] <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 12 July 26, 2001, [email protected] >>>
  12. List Comprehensions How to shoot yourself in the foot Iteration

    variables used in list comprehensions are evaluated in "outer" scope. for i in s: ... t = [2*i for i in r] # Overwrites outer i # Corrupted value of i ... It is very easy to forget this fairly annoying side effect How make heads explode (courtesy of Tim Peters) >>> d = range(3) >>> x = [None] * 3 >>> base3 = [x[:] for x[0] in d for x[1] in d for x[2] in d] >>> base3 [[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 1, 0], [0, 1, 1], [0, 1, 2], [0, 2, 0], [0, 2, 1], [0, 2, 2], ... <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 13 July 26, 2001, [email protected] >>>
  13. zip() New built-in function: zip zip(s1,s2,s3,...,sn) Creates a list of

    tuples where each tuple contains an element from each si. a = [ 1,2,3 ] b = [ 10,11,12 ] c = zip(a,b) # c = [ (1,10), (2,11), (3,12) ] Resulting list is truncated to the length of the shortest sequence in s1,s2, ... sn. Contrast to map(None,a,b) a = [1,2,3] b = [10,11,12,13] c = zip(a,b) # c = [(1,10), (2,11), (3,12) ] d = map(None,a,b) # d = [(1,10), (2,11), (3,12), (None,13) ] <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 14 July 26, 2001, [email protected] >>>
  14. New Function Call Syntax Functions with variable length arguments #

    Python 1.5 version def wrap_foo(*pargs,**kwargs): print "Calling foo" apply(foo,pargs,kwargs) Python 2.0 provides a new syntax that replaces apply() # Python 2.0 version def wrap_foo(*pargs,**kwargs): print "Calling foo" foo(*pargs,**kwargs) Also works in combination with other arguments def foo(w,x,y,z): ... a = (3,4) foo(1,2,*a) # Same as foo(1,2,3,4) Note: *a and **b arguments must appear last. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 15 July 26, 2001, [email protected] >>>
  15. Function and Method Attributes In Python 2.1, functions and methods

    can now have attributes def foo(x): ... foo.secure = 1 foo.private = 0 if foo.secure: foo(x) Attributes stored in underlying __dict__ attribute (like with classes) Primary use Storing additional information about a function or method Useful in parser generators, network applications, etc. Note: No way to initialize attributes in function definition # This does not work def foo(x): foo.secure = 1 ... if foo.secure: foo(1) # AttributeError: secure not defined <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 16 July 26, 2001, [email protected] >>>
  16. Nested Scopes In Python 2.1, nested functions defined nested scopes

    Optional feature that must be enabled using: from __future__ import nested_scopes Example def foo(): x = 1 def bar(): print x # Use of nonlocal variable while x < 10: bar() x += 1 In previous versions of Python This code generates a NameError exception (x not defined). No notion of an "enclosing" scope (only local and global scope). In Python 2.1: It works! <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 17 July 26, 2001, [email protected] >>>
  17. Nested Scopes Nested scopes clean a few things up Python

    2.1: def make_adder(x): def adder(n): return x+n return adder add2 = make_adder(2) print add2(4) # Prints '6' Older versions (default argument hack) def make_adder(x): def adder(n, x=x): # x set as default argument return x + n return adder add2 = make_adder(2) print add2(4) <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 18 July 26, 2001, [email protected] >>>
  18. Nested Scopes Nested scopes and lambda Older versions of python

    def foo(): a = 0.5 b = 1.5 c = 2 d = 4 # Call a function with a callback plot_function(lambda x,a=a,b=b,c=c,d=d: a*(x**3) + b*(x**2) + c*x + d) Python 2.1: def foo(): a = 0.5 b = 1.5 c = 2 d = 4 plot_function(lambda x: a*(x**3) + b*(x**2) + c*x + d) <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 19 July 26, 2001, [email protected] >>>
  19. Nested Scopes Variable Access and Assignment Inner function can access

    variables defined in enclosing scopes. However, can not modify their values Assignment always binds to a name in the local scope or global scope x = 2 def foo(): a = 1 x = 3 def bar(): global x a = 10 # Creates local a x = 11 # Modifies global x bar() print "a =",a print "x =",x foo() # prints "a = 1", "x = 3" print x # prints "11" <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 20 July 26, 2001, [email protected] >>>
  20. Nested Scopes Implementation Static scoping Unbound names in inner functions

    only bind to statically defined names in outer function This only occurs when code is internally compiled into Python byte-code. def foo(): def bar(): print x # x bound to outer function print y # y not known, use normal scoping rules (global) x = 4 Does not allow names to be dynamically generated def foo(): def bar(): print x exec "x = 4" # Whoa! <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 21 July 26, 2001, [email protected] >>>
  21. Nested Scopes Incompatibilities exec and execfile() # This Python code

    is illegal def foo(): x = 1 def bar(a): return x + a exec "x = x + 1" SyntaxError: unqualified exec is not allowed in function 'foo' it contains a nested function with free variables Use of dynamic features problematic in inner functions # This code doesn't work def foo(y): x = 1 def bar(a): return eval("x+a") # name 'x' not defined return bar(y) <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 22 July 26, 2001, [email protected] >>>
  22. Rich Comparisons Prior to Python 2.1 Comparison of objects handled

    through a special __cmp__(self,other) method. Returns negative if self < other, 0 if self == other, positive if self > other Python 2.1 Each comparison operator can be defined individually (<, <=, >, >=, ==, !=) __lt__(), __le__(), __gt__(), __ge__(), __eq__(), and __ne__() special methods Methods can return any value, raise exceptions, etc. Primary applications Comparison of types for which only a subset of operators make sense. Example: complex numbers. Can test for equality, but < and > mathematically meaningless. Matrices and vectors. May want to compute comparison of individual elements. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 23 July 26, 2001, [email protected] >>>
  23. Rich Comparisons Example: Complex numbers class Complex: def __init__(self,r,i): self.real

    = r self.imag = i def __eq__(self,other): if self.real == other.real and \ self.imag == other.imag: return 1 return 0 def __ne__(self,other): return not self.__eq__(other) def __lt__(self,other): raise TypeError, "can't compare with <, <=, >, >=" __le__ = Complex.__lt__ __ge__ = Complex.__lt__ __gt__ = Complex.__lt__ <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 24 July 26, 2001, [email protected] >>>
  24. Rich Comparisons Example: Element-wise list comparsion import UserList class MyList(UserList.UserList):

    def __eq__(self,other): return map(lambda x,y: x==y, self,other) def __ne__(self,other): return map(lambda x,y: x!=y, self,other) def __lt__(self,other): return map(lambda x,y: x<y, self,other) def __le__(self,other): return map(lambda x,y: x<=y, self,other) def __gt__(self,other): return map(lambda x,y: x>y, self,other) def __ge__(self,other): return map(lambda x,y: x>=y, self,other) a = [ 3, 7, 10, -2] b = [ 2, 5, 15, 20] c = a < b # c = [0, 0, 1, 1] <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 25 July 26, 2001, [email protected] >>>
  25. String Methods Strings now have methods Replaces functionality in the

    string module s.capitalize() s.center(width) s.count(sub [,start [,end]]) s.find(sub [,start [,end]]) s.isalnum() s.join(t) s.lower() s.replace(old,new [,maxreplace]) ... Example s = "Hello World" t = s.upper() # t = "HELLO WORLD" a = s.split() # a = ['Hello', 'World'] <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 26 July 26, 2001, [email protected] >>>
  26. String Methods Comments String methods never modify the underlying string

    Methods always return a new string (if applicable) An odd example t = ['Hello','World'] # Old string module s = string.join(t,":") # s = "Hello:World" # String method s = ":".join(t) # s = "Hello:World" The string module Reimplemented to use string methods Officially deprecated, but still in widespread use. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 27 July 26, 2001, [email protected] >>>
  27. Garbage Collection Prior to Python 2.0 Cyclical data structures caused

    memory leaks. No way to reclaim memory due to reference counting. a = [ ] b = [ a ] a.append(b) del a del b # a, b still allocated A problem for certain data structures (graphs, trees, etc.) Python 2.0 adds garbage collection of cycles Containers (lists, tuples, dictionaries) placed on internal list Periodic scan of list used to detect and remove unreferenced cyclical data Properties of garbage collection can be controlled using the gc module <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 28 July 26, 2001, [email protected] >>>
  28. Garbage Collection Implementation Three-level generational scheme. Newly created objects placed

    in generation 0. Garbage collection step checks generation 0 and moves surviving objects to generation 1 (checked less frequently) Objects surviving check of generation 1 move to generation 2. Long-lived objects checked infrequently The problem with __del__() Classes that define a __del__() method are not collected Placed on a list of uncollectable objects. Problem: finalization __del__() might reference the other object. No way to determine proper order of destruction. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 29 July 26, 2001, [email protected] >>>
  29. Garbage Collection The gc module Provides an API for controlling

    the garbage collector import gc gc.disable() # Turn off garbage collection gc.enable() # Enable garbage collection gc.collect() # Run full garbage collection step gc.set_threshold(1000,10,10) # Set frequency of garbage collection # Print list of uncollectable objects print gc.garbage # Print debugging information to find memory leaks gc.set_debug(gc.DEBUG_LEAK) <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 30 July 26, 2001, [email protected] >>>
  30. Extended Print Statement New syntax for printing to file f

    = open("foo","w") print >>f, "Hello World" for i in range(10): print >>f, "i = ",i f.close() Previously, print only printed to sys.stdout oldstdout = sys.stdout sys.stdout = open("foo","w") print "Hello World" for i in range(10): print "i = ",i sys.stdout.close() sys.stdout = oldstdout Any file-like object can be used Built-in files, StringIO, etc. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 31 July 26, 2001, [email protected] >>>
  31. Modified Import Can supply new names for modules import socket

    as sock s = sock.socket(sock.AF_INET,sock.SOCK_STREAM) ... from string import replace as rep rep(s,"foo","bar") __all__ attribute A module can explicitly control list of exports for from module import * def foo(): ... def bar(): ... __all__ = ["foo"] Case sensitivity Python 2.1 provides case sensitive import on case-insensitive platforms. Example: Windows is case-preserving, but case-insensitive <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 32 July 26, 2001, [email protected] >>>
  32. Warning Framework Python 2.1 introduces warnings Informational messages issued at

    runtime Primary intent is to inform users of deprecated/problematic features >>> import regex __main__:1: DeprecationWarning: the regex module is deprecated; please use the re module >>> def foo(): ... def bar(): ... print x ... exec "x = 1" ... <stdin>:1: SyntaxWarning: unqualified exec is not allowed in function 'foo' it contains a nested function with free variables >>> Unlike exceptions, control does not stop. Only a message is printed. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 33 July 26, 2001, [email protected] >>>
  33. Warning Framework Types of Warnings Warning # Base-class for all

    warnings UserWarning # Default warning DeprecationWarning # Deprecated feature SyntaxWarning # Use of dubious syntax features RuntimeWarning # Use of dubious runtime features Each is also derived from Exception Issuing a warning warnings module warn(message [, category]) import warnings warnings.warn("Hey, I'm warning you...") # UserWarning warnings.warn("x is deprecated, use y", DeprecationWarning) <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 34 July 26, 2001, [email protected] >>>
  34. Warning Framework The warning filter Handling of each warning message

    can be modified Warning filter specifies actions for individual warnings Actions: 'error' Turn warning into exception 'ignore' Ignore the warning 'always' Always print the warning message 'once' Print warning message only once 'default' Print warning message once for location where warning message is issued 'module' Print warning message once for each module where warning message is issued warnings.filterwarnings(action [, message [, category [, module [, lineno]]]]) action = One of the above actions message = Regular expression matching warning message category = DeprecationWarning, SyntaxWarning, etc. module = Regular expression matching module name lineno = Line number matching location of warning <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 35 July 26, 2001, [email protected] >>>
  35. Warning Framework Warning filter examples # Ignore all deprecation warnings

    warnings.filterwarnings('ignore','.*',DeprecationWarning) warnings.filterwarnings('ignore', category=DeprecationWarning) # Ignore deprecation warnings created by this module warnings.filterwarnings('ignore',DeprecationWarning,module=__name__) # Turn SyntaxWarnings into exceptions warnings.filterwarnings('error',SyntaxWarning) -Waction:message:category:module:lineno option python -Wignore::DeprecationWarning python -Wignore::DeprecationWarning:foobar python -Werror::SyntaxWarning python -Wignore # Ignore all warnings python -Werror # Turn all warnings into errors <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 36 July 26, 2001, [email protected] >>>
  36. __future__ New language features now enabled by a special module

    from __future__ import nested_scopes Intent is to first introduce features that might break old code as optional Features enabled by __future__ will eventually be enabled by default To help find problems, warnings are generated for problematic code >>> def foo(): ... def bar(): ... print x ... exec "x = 1" <stdin>:1: SyntaxWarning: unqualified exec is not allowed in function 'foo' it contains a nested function with free variables >>> from __future__ import nested_scopes >>> def foo(): ... def bar(): ... print x ... exec "x = 1" SyntaxError: unqualified exec is not allowed in function 'foo' it contains a nested function with free variables <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 37 July 26, 2001, [email protected] >>>
  37. Weak References Reference Counting Normally, all Python objects are managed

    through reference counting Reference counts modified by assignment, deletion, scopes, etc. a = Object() # refcnt = 1 b = a # refcnt = 2 c["foo"] = a # refcnt = 3 del b # refcnt = 2 Weak reference Mechanism for referring to object without increasing its reference count. import weakref a = Object() b = weakref.ref(a) # Create weak reference to a ... obj = b() # Dereference b if obj: # Do something else: # Object is gone! <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 38 July 26, 2001, [email protected] >>>
  38. Weak References Creating a weak reference weakref module weakref.ref(object [,callback])

    Creates a weak reference to object. callback executes when object is deleted (argument is weak reference) class Foo: pass def cleanup(x): print x, "deleted" a = Foo() b = weakref.ref(a, cleanup) ... r = b() # Dereference. Returns a or None. ... del a # Might cause cleanup() to be called To dereference, simply call the weak reference as function Returns the original object or None if it no longer exists <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 39 July 26, 2001, [email protected] >>>
  39. Weak References Applications Caching of previously computed results import weakref

    def foo(x,cache={}): wr = cache.get(x,None) if wr == None or wr() == None: r = compute_foo(x) # Compute result cache[x] = weakref.ref(r) # Create weak reference to it return r else: return wr() # Return previous result Weak reference allows original object to go away when no longer in use. Caveats Weak references only work with instances, functions, and methods. Weak references to strings, lists, dictionaries, etc. not currently supported. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 40 July 26, 2001, [email protected] >>>
  40. Weak Proxies Weak proxy A wrapper around a weakly referenced

    instance >>> import weakref >>> import UserDict >>> d = UserDict.UserDict() >>> wd = weakref.proxy(d) >>> wd["spam"] = "eggs" >>> wd["michael"] = "ellis" >>> del d >>> wd["spanish"] = "inquisition" Traceback (most recent call last): File "", line 1, in ? weakref.ReferenceError: weakly-referenced object no longer exists >>> <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 41 July 26, 2001, [email protected] >>>
  41. Minor Changes New dictionary methods d.setdefault(key [, value]) >>> a

    = { } >>> a.setdefault('foo','bar') # Sets value 'bar' >>> a.setdefault('foo','spam') # Returns set value 'bar' >>> d.popitem() >>> a = { 'hello':'world', 'x':3} >>> a.popitem() # Remove random item ('x', 3) >>> a.popitem() ('hello','world') # Remove random item >>> a {} <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 42 July 26, 2001, [email protected] >>>
  42. Minor Changes New file method f.xreadlines() Allows fast iteration over

    lines without reading entire file f = open("foo") i = 0 for s in f.xreadlines(): print "%5d: %s" % (i,s), i += 1 <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 43 July 26, 2001, [email protected] >>>
  43. Minor Changes Interactive Display Hook sys.displayhook(obj) Used to print results

    evaluated in interactive interpreter >>> 3+4 7 >>> def foo(x): ... print "result =", x >>> sys.displayhook = foo >>> 3+4 result = 7 >>> <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 44 July 26, 2001, [email protected] >>>
  44. Minor Changes Uncaught Exception Hook sys.excepthook(type, value, traceback) Called when

    an uncaught exception reaches the top level of the interpreter Can be used to provide customized output (more debugging information) >>> def uncaught(type,value,tb): ... print "Guru meditation error. %x" % id(type) ... >>> sys.excepthook = uncaught >>> s Guru meditation error. cc724 More practical use - specialized error handling CGI scripts, embedded systems, debuggers, etc. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 45 July 26, 2001, [email protected] >>>
  45. Minor Changes Execution of .pyc and .pyo files Not sure

    when this was added $ python foo.pyc $ python foo.pyo Various output changes repr(s) now uses standard escape codes for certain characters >>> repr("hello\n") 'hello\n' Several minor changes to string format operator % <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 46 July 26, 2001, [email protected] >>>
  46. Unicode Python 2.0 provides support for Unicode strings Needed for

    internationalization, XML, etc. Unicode: In a nutshell Internally, all character values are extended to 16 bit integers (a C short or wchar_t). Character values 0-127 represent the same characters as 8-bit ASCII. Otherwise, everything is about the same (well, mostly). Issues How do you specify Unicode strings in a program? (You can't type most of the characters) External representation and I/O. Compatibility with 8-bit strings (comparison, coercion, regular expressions, etc.) Note: When discussing Unicode, U+xxxx used to indicate a Unicode character value. Ex: U+006A This is not python syntax <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 48 July 26, 2001, [email protected] >>>
  47. Unicode Unicode Literals Precede string literal with u a =

    "Hello" # 8-bit string : 45 65 6c 6c 6f b = u"Hello" # Unicode string : 0045 0065 006c 006c 006f Specifying Unicode characters. Use \uxxxx. c = u"\u10f2\u0455" d = u"\u0045\u0065\u006c\u006c\u006f" Raw Unicode strings e = ur"M\u00fcller\n" Comments Python source files are 8-bit ASCII. Unicode string literals are not written in any special encoding (UTF-8, UTF-16, etc.) Raw strings a little strange. All escape codes uninterpreted except for \u. -U option makes all string literals Unicode (the default in the future??) <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 49 July 26, 2001, [email protected] >>>
  48. Unicode - External Representation The Encoding Problem Often have to

    read Unicode strings from files and other byte-streams What data encoding do you use? a = u"Hello" # Unicode: 0045 0065 006c 006c 006f f = open("foo","w") f.write(a) # ???? Little endian encoding (least significant byte first) 45 00 65 00 6c 00 6c 00 6f 00 Big endian encoding (most significant byte first) 00 45 00 65 00 6c 00 6c 00 6f Variable length encoding? (ex. UTF-8) 45 65 6c 6c 6f <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 50 July 26, 2001, [email protected] >>>
  49. Unicode - Encodings Encodings Python provides the following string encoding

    types 'ascii' # 7-bit ASCII (0-127) 'latin-1', 'iso-8859-1' # 8-bit extended ASCII (0-255) 'utf-8' # 8-bit variable length encoding 'utf-16' # 16-bit variable length encoding 'utf-16-le' # 16-bit little endian 'utf-16-be' # 16-bit big endian 'unicode-escape' # Format used in u"xxxxx" literals 'raw-unicode-escape' # Format used in ur"xxxxx" literals To encode: s.encode([encoding [,errors]]) >>> s = u"Hello" >>> s.encode('utf-8') 'Hello' >>> s.encode('utf-16-le') 'H\000e\000l\000l\000o\000' >>> s.encode('utf-16-be') '\000H\000e\000l\000l\000o' <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 51 July 26, 2001, [email protected] >>>
  50. Unicode - Decoding Decoding Conversion of raw byte-streams back into

    Unicode strings unicode(s, [encoding [,errors]]) >>> e = 'H\000e\000l\000l\000o\000' >>> unicode(e,'utf-16-le') u'Hello' >>> unicode('hello', 'utf-8') u'Hello' >>> Of course, to properly decode a string, you need to know what encoding was used Usually, this is obtained elsewhere (e.g., MIME header) Content-type: text/plain Encoding: utf-8 <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 52 July 26, 2001, [email protected] >>>
  51. Unicode Error handling Encoding/decoding errors result in UnicodeError exception. Can

    control behavior with optional errors parameter 'strict' # Errors cause UnicodeError exception 'ignore' # Errors are silently ignored 'replace' # Bad characters replaced by special replacement character >>> a = u"M\u00fcller" >>> a.encode('ascii') UnicodeError: ASCII encoding error: ordinal not in range(128) >>> a.encode('ascii','ignore') 'Mller' >>> a.encode('ascii','replace') 'M?ller' <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 53 July 26, 2001, [email protected] >>>
  52. Unicode - I/O I/O in a nutshell To write data,

    must be encoded in some external format u = # Big Unicode string f = open("foo","w") f.write(u.encode('utf-8')) f.close() To read data, must be decoded f = open("foo") u = unicode(f.read(),'utf-8') Unfortunately, explicit decoding/encoding is awkward and error prone Solution: use the codecs module <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 54 July 26, 2001, [email protected] >>>
  53. Unicode - I/O codecs module Provides convenient interface for encoding,

    decoding, and file I/O General idea: perform a codec lookup: import codecs encoder, decoder, reader, writer = codecs.lookup('utf-8') Returns a tuple of functions. encoder/decoder functions are low-level functions that work with partial data reader/writer functions provide wrappers around other file objects. Example: reading a file f = reader(open("foo")) u = f.read() f.close() In practice Data encoding method is often embedded in the file itself. Based on encoding method, a codec would be selected to perform the actual reading and writing <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 55 July 26, 2001, [email protected] >>>
  54. Unicode - I/O codecs example: Autodetection of XML encoding XML

    document starts with Encoding can be determined by looking at first few bytes of input 3C 3F 78 6D # UTF-8, ASCII, Latin-1 3C 00 3F 00 # UTF-16-LE 00 3C 00 3F # UTF-16-BE ... Use of codec encodings = { '\x3c\x3f\x78\x6d' : 'utf-8', '\x3c\x00\x3f\x00' : 'utf-16-le', '\x00\x3c\x00\x3f' : 'utf-16-be' } f = open("foo.xml") reader = codecs.lookup(encodings[f.read(4)])[2] fr = reader(f) ... <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 56 July 26, 2001, [email protected] >>>
  55. Unicode and Standard Strings Mixing with standard strings Unicode strings

    and standard strings can be mixed together Operators (+, %, etc.) Dictionary keys String methods Built-in functions and modules Examples a = "Hello" b = u"World" c = a + b General approach When mixed in an operator, standard strings are always coerced to Unicode. c = unicode(a) + b When standard strings expected, unicode is encoded into 8-bit string f = open(b) # f = open(b.encode()) <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 57 July 26, 2001, [email protected] >>>
  56. Unicode and Standard Strings Default Encoding Default encoding/decoding is determined

    at interpreter startup Can be obtained from sys.getdefaultencoding() Default is usually 'ascii' Can be changed in site.py or sitecustomize.py However, this is a good way to get strange program behavior. Comments Mixing of Unicode and standard strings mostly works like you expect If strings contain identical characters, will compare as equals, have same hash value, etc. May get occasional UnicodeErrors when converting. Performance is obviously worse if many conversions are performed. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 58 July 26, 2001, [email protected] >>>
  57. Unicode Character Properties Unicode Character Property Database unicodedata module Provides

    information about character properties (capitalization, numeric values, etc.) >>> import unicodedata >>> unicodedata.category(u'A') 'Lu' >>> unicodedata.category(u'4') 'Nd' >>> unicodedata.numeric(u'\u2155') # \u2155 is fraction (1/5) 0.2 >>> <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 59 July 26, 2001, [email protected] >>>
  58. Structured Markup Processing Python 2.1 provides extensive support for XML

    xml.dom xml.dom.minidom xml.dom.pulldom xml.sax xml.sax.handler xml.sax.saxutils xml.sax.xmlreader xml.parsers.expat xmllib (deprecated) sgmllib htmllib htmlentitydefs This is a huge topic I'm only going to discuss it briefly There are a variety of HOWTOs and tutorials on the web. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 61 July 26, 2001, [email protected] >>>
  59. XML in a Nutshell It's like HTML, but with user-definable

    elements Well, mostly. Example: <package name="swig"> <version>1.3</version> <homepage>http://www.swig.org</homepage> <author>David Beazley</author> </package> DTDs A DTD is a formal specification of the elements and attributes that are allowed XML Parsing Non-validating parsers. Check for valid structure, but don't verify DTD. Validating parsers. Check for valid structure and for DTD compliance. Python mostly provides support for non-validating parsers <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 62 July 26, 2001, [email protected] >>>
  60. XML Parsing Two common approaches Event driven parsing (SAX) Document

    Object Model (DOM) SAX (Simple API for XML) Read through the XML document Trigger different functions for each element/entity as they are encountered. Build up a data structure as you go DOM Entire XML document stored in memory as a big tree. Access data by tree traversal. Allows easy modification, manipulation of subtrees. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 63 July 26, 2001, [email protected] >>>
  61. Networking Improvements socket module Support for RAW sockets Support for

    OpenSSL clients from socket import * s = socket(AF_INET, SOCK_STREAM) s.connect(("www.blah.com",443)) ss = ssh(s,None,None) Other network modules Support for OpenSSL where applicable. urllib, urllib2, httplib, etc. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 64 July 26, 2001, [email protected] >>>
  62. Memory Mapped Files mmap module Provides access to memory mapped

    files and the mmap() system call import mmap import os f = os.open("foo", os.O_RDWR) m = mmap.mmap(f, 32768, mmap.MAP_SHARED, mmap.PROT_READ | mmap.PROT_WRITE) m[10] = 'x' print m[200:300] m.move(10000,20000,2000) ... m.close() Support anonymous shared memory, private/shared mappings, etc. Comments Underlying file has to be opened with the same permissions Usually only works in page sized regions (mmap.PAGESIZE). Also works on Windows (slightly different API). <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 65 July 26, 2001, [email protected] >>>
  63. Internationalization gettext module Provides an interface to the GNU gettext

    library Used to internationalize applications using a database of translated strings. import gettext gettext.bindtextdomain("myapp","./locale") gettext.textdomain("myapp") _ = gettext.gettext pw = getpass.getpass(_("password:")) if pw != correct: print _("Authorization failed.\n") raise SystemExit General idea Wrap strings to be translated by special _(...) function. Use Tools/i18n/pygettext.py to extract strings into a special file Modify the file by supplying translations of the strings. Build a translation database using Tools/i18n/msgfmt.py and drop result in the locale directory <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 66 July 26, 2001, [email protected] >>>
  64. Other Library Enhancements _winreg module Access to the Windows registry.

    zipfile Decoding/encoding of PKZIP zip files (common on Windows). webbrowser Portable API for launching a web browser Cookie HTTP cookie processing. Useful in CGI programming. atexit Improved interface for registering cleanup actions for program termination <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 67 July 26, 2001, [email protected] >>>
  65. Other Library Enhancements os module More complete set of POSIX

    system calls Functionality of the popen2 module now included popen(command [, mode [, bufsize]]) popen2(command [, mode [, bufsize]]) popen3(command [, mode [, bufsize]]) popen4(command [, mode [, bufsize]]) re module Support for Unicode strings A few new regular expression patterns random module Extensive changes and cleanup of the API Old whrandom module deprecated <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 68 July 26, 2001, [email protected] >>>
  66. Other Library Enhancements time module Many functions now have optional

    arguments. print time.asctime() # compare to print time.asctime(time.localtime(time.time())) UserString module A subclassable wrapper around string objects filecmp module Functions for comparing files and directories <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 69 July 26, 2001, [email protected] >>>
  67. pydoc $ pydoc md5 Python Library Documentation: module md5 NAME

    md5 FILE /usr/local/lib/python2.1/lib-dynload/md5.so DESCRIPTION This module implements the interface to RSA's MD5 message digest algorithm (see also Internet RFC 1321). Its use is quite straightforward: use the new() to create an md5 object. You can now feed this object with arbitrary strings using the update() method, and at any point you can ask it for the digest (a strong kind of 128-bit checksum, a.k.a. ``fingerprint'') of the concatenation of the strings fed to it so far using the digest() method. Functions: new([arg]) -- return a new md5 object, initialized with arg if provided md5([arg]) -- DEPRECATED, same as new, but for compatibility Special Objects: MD5Type -- type object for md5 objects
  68. FUNCTIONS md5(...) new([arg]) -> md5 object Return a new md5

    object. If arg is present, the method call update(arg) is made. new(...) new([arg]) -> md5 object DATA MD5Type = __file__ = '/usr/local/lib/python2.1/lib-dynload/md5.so' __name__ = 'md5' <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 71 July 26, 2001, [email protected] >>>
  69. Distutils Python 2.0/Python 2.1 both provide distutils module Tool for

    distributing Python packages and extension modules Particularly useful for extension code Capable of producing RPMs and other types of packages Python 2.1 is partially built with distutils Many formerly built-in modules are now built as dynamically loadable extensions Decreases the size of Python executable and improves startup time. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 72 July 26, 2001, [email protected] >>>
  70. New Development Process PEPs (Python Enhancement Proposals) http://python.sourceforge.net/peps Information on

    prior enhancements New directions Wild ideas The future? Look at the PEPs. <<< O'Reilly OSCON 2001, New Features in Python 2, Slide 73 July 26, 2001, [email protected] >>>