Advanced geoprocessing with…
MAGIC 2012
Chad Cooper – [email protected]
Center for Advanced Spatial Technologies
University of Arkansas, Fayetteville
Slide 2
Slide 2 text
Intros
• your name
• what you do/where you work
• used Python much?
– any formal training?
– what do you use it for?
• know any other languages?
Slide 3
Slide 3 text
Objectives
• informal class
– expect tangents
– code as we go
• not geared totally to ArcGIS
• THINK – oddball and out of the ordinary
applications will make you want more…
Slide 4
Slide 4 text
Outline
• data types review
• functions
• procedural vs. OOP
• geometries
• rasters
• spatial references
• error
handling/logging
• documentation
• 3rd party modules
• module installation
• the web
–fetching
–scraping
–email
–FTP
• files
Slide 5
Slide 5 text
Strings
• ordered collections of characters
• immutable – can’t change it
• raw strings:
• slicing
• indexing:
• iteration/membership
Lists
• list – ordered collection of arbitrary objects
• ordered
• mutable – you can change it
<- Extend concats lists
Slide 8
Slide 8 text
Lists…
• iterable – very important!
• membership
• nestable – 2D array/matrix
• access by index – zero based
Slide 9
Slide 9 text
Dictionaries
• unordered collection of arbitrary objects
• key/value pairs – think hash/lookup table (keys
don’t have to be numbers)
• nestable, mutable
• access by key, not offset
Slide 10
Slide 10 text
Dictionaries
• iterable
Slide 11
Slide 11 text
Tuples
• ordered collection of arbitrary objects
• immutable – cannot add, remove, find
• access by offset
• basically an unchangeable list
• so what’s the purpose?
– FAST – great for iterating over constant set of
values
– SAFE – you can’t change it
Slide 12
Slide 12 text
List comprehensions
• Map one list to another by applying a function
to each of the list elements
• Original list goes unchanged
Slide 13
Slide 13 text
Sets
• unordered collections of objects
• like mathematical sets – collection of distinct
objects – NO DUPLICATES
• example – get rid of dups in a list via list comp
Slide 14
Slide 14 text
Sets
• get rid of dups via set:
• union:
Slide 15
Slide 15 text
Sets
• intersection – data are the same
• symmetrical difference – data are not the same
• difference – data in first set but not second
Slide 16
Slide 16 text
Programming paradigms:
big blob of code
• OK on a small scale for GP scripts
• gets out of hand quickly
• hard to follow
• think ModelBuilder-exported code
Slide 17
Slide 17 text
Programming paradigms:
procedural programming
• basically a list of instructions
• program is built from one or more procedures
(functions) – reusable chunks
• procedures called at anytime, anywhere in
program
• focus is to break task into collection of variables,
data structures, subroutines
• natural style, easy to understand
• strict separation between code and data
Slide 18
Slide 18 text
Functions
• portion of code within a larger program that
performs a specific task
• can be called anytime, anyplace
• can accept arguments
• should return a value
• keeps code neat
• promotes smooth flow
Slide 19
Slide 19 text
Functions
Slide 20
Slide 20 text
Programming paradigms:
Procedural example
Slide 21
Slide 21 text
Programming paradigms:
Object-oriented programming (OOP)
• break program down into data types (classes)
that associate behavior (methods) with data
(members or attributes)
• code becomes more abstract
• data and functions for dealing with it are
bound together in one object
• objects let you wrap complex processes, but
present a simple interface to them
• methods and attributes are encapsulated
inside the object
• methods and attributes are exposed to users
• you can then update the object without
breaking the interface
• you can pass objects around - carefully
Programming paradigms:
Object-oriented programming (OOP)
Slide 24
Slide 24 text
Programming paradigms:
OOP - Inheritance
• classes can inherit attributes and methods
• allows you to reuse and customize existing
code inside a new class
• you can override methods
• you can add new methods to a class without
modifying the existing class
Slide 25
Slide 25 text
Programming paradigms:
OOP - Inheritance
Slide 26
Slide 26 text
Programming paradigms:
OOP - Inheritance
Slide 27
Slide 27 text
Modularizing code
• I’m lazy, so I want to reuse code
• statement – call functionality in
another module
• Have one custom module (a .py file) with code
you use all the time
• Great way to package up helper functions
• ESRI does this with ConversionUtils.py
C:\Program Files (x86)\ArcGIS\Server10.0\ArcToolBox\Scripts
Slide 28
Slide 28 text
Geometries
• heirarchy:
– feature class is made of features
– feature is made of parts
– part is made of points
• heirarchy in Pythonic terms:
– part:
– multipart polygon:
– single part polygon with hole:
Slide 29
Slide 29 text
Reading geometry
• accessed through the geometry object of a
feature
• example: describe_geometry_arcmap.py
1.open up
SearchCursor
2.loop through
rows
3.get geometry
4.print out X, Y
Slide 30
Slide 30 text
Reading geometry
Slide 31
Slide 31 text
Reading geometry
Slide 32
Slide 32 text
Reading geometry
Slide 33
Slide 33 text
Writing geometry
•
• point features are point objects, lines and
polygons are arrays of point objects
–
• Geometry objects can be created using the
Geometry, Mulitpoint, PointGeometry,
Polygon, or Polyline classes
Slide 34
Slide 34 text
Writing geometry
Slide 35
Slide 35 text
Writing geometry
Slide 36
Slide 36 text
Rasters
• class
– raster object: variable that references a raster
dataset
– gives access to raster props
• raster calculations – Map Algebra
–
– can cast to Raster object for calculations
Slide 37
Slide 37 text
Rasters
Slide 38
Slide 38 text
Spatial references
• can get properties from
• class
• methods to create/edit spatial refs
Slide 39
Slide 39 text
Spatial references
• class
• methods to create/edit spatial refs
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
Exception Handling
• It’s necessary, stuff fails
• Useful error reporting
• Proper application cleanup
• Combine it with logging
Slide 42
Slide 42 text
Exception handling – try/except
• most basic form of error handling
• wrap whole program or portions of code
• use optional clause for cleanup
–close open files
–close database connections
–check extensions back in
Slide 43
Slide 43 text
Exception handling
Slide 44
Slide 44 text
Exception handling
Slide 45
Slide 45 text
Exception handling - raise
• allows you to force an exception to occur
• can be used to alert of conditions
Slide 46
Slide 46 text
Exception handling - raise
Slide 47
Slide 47 text
Exception handling
AddError and traceback
• – returns GP-specific errors
• – prints stack trace; determines
precise location of error
– good for larger, more complex programs
Slide 48
Slide 48 text
Exception handling –
AddError and traceback
Slide 49
Slide 49 text
Logging
• logging module
• logging levels:
– : detailed; for troubleshooting
– : normal operation, statuses
– : still working, but unexpected behavior
– : more serious, some function not working
– : program cannot continue
Slide 50
Slide 50 text
Super-basic logging
Slide 51
Slide 51 text
Super-basic logging to a log file
Slide 52
Slide 52 text
Super-basic logging to a log file
Slide 53
Slide 53 text
Meaningful logging
• “customize” the logger
• add in info-level message(s) to get logged
• log our errors to log file
• can get much more advanced, see the docs
Slide 54
Slide 54 text
Meaningful logging
Slide 55
Slide 55 text
Meaningful logging
Slide 56
Slide 56 text
Code documentation
• Pythonic standards covered in PEPs 8 and 257
• help()
• comments need to be worth it
• name items well
• be precise and compact
• comments may be for you
Slide 57
Slide 57 text
Creating documentation
• – built-in; used by help()
– generate HTML on any module
– kinda plain
• – old, rumored to be dead
– produces nicely formatted HTML
– easy to install and use
• Sphinx framework
– “intelligent and beautiful documentation”
– all the cool kids are using it (docs.python.org)
– more involved to setup and use
Slide 58
Slide 58 text
Branching out
Slide 59
Slide 59 text
Installing packages
Slide 60
Slide 60 text
Installing packages (on Windows)
• Windows executables
• Python eggs
– .zip file with metadata, renamed .egg
– distributes code as a bundle
– need easy_install
• pip
– tool for installing and managing Python packages
– replacement for easy_install
Slide 61
Slide 61 text
pip
• can take care of dependencies for you
• uninstallation!
• install via , ironically
–
Slide 62
Slide 62 text
virtualenv
• a tool to create isolated Python environments
• manage dependencies on a per-project basis,
rather than globally installing
• test modules without installing into site-
packages
• avoid unintentional upgrades
Slide 63
Slide 63 text
virtualenv
• install via pip, easy_install, or by
• create the env
• activate the env
• use the env
Slide 64
Slide 64 text
virtualenv
• installs Python where you tell it, modifies
system path to point there
– good only while the env is activated
• use yolk to list installed packages in env
• But can this work in ArcMap Python prompt?
Slide 65
Slide 65 text
virtualenv
• YES, with a little work...
• tells ArcMap to use Python interpreter in our
virtualenv
– kill ArcMap, back to using default interpreter
Slide 66
Slide 66 text
No content
Slide 67
Slide 67 text
The web
• Infinite source of information
• Right-click and “Save as” is so lame (and too
much work)
• Python can help you exploit the web
– ftplib, http (urllib), mechanize, scraping (Beautiful
Soup), send email (smtplib)
Slide 68
Slide 68 text
Fetching data
• Built-in libraries for ftp and http
• ftplib – log in, nav to directory, retrieve files
• urllib/urllib2 – pass in the url you want, get it
back
• wget – GNU commandline tool
– Can call with os.system()
Slide 69
Slide 69 text
Fetching data
Slide 70
Slide 70 text
Scraping
• Scrape data from a web page
• Well-structured content is a HUGE help, as is
valid markup, which isn’t always there
• BeautifulSoup 3rd party module
– Built in methods and regex’s help out
– Great for getting at tables of data
Emailing
• smtp built-in library
• best if you have IP of your email server
• port blocking can be an issue
• there’s always Gmail too…
Slide 75
Slide 75 text
Files
• built in open function – slurp entire file into
memory – OK except for huge files
• iterate over the lines
• CSV module
Slide 76
Slide 76 text
No content
Slide 77
Slide 77 text
Excel
• love, hate, love
• many modules out there
– xlrd (read) / xlwt (write) – only .xls
– openPyXL – read/write .xlsx
• uses
– Push text data to Excel file
– Push featureclass data to Excel programmatically
– Read someone else’s “database”
Slide 78
Slide 78 text
Reading Excel
Slide 79
Slide 79 text
Writing Excel
Slide 80
Slide 80 text
Writing Excel
Slide 81
Slide 81 text
Databases
• You can connect to pretty much ANY database
• Is there one true solution??
• pyodbc – Access, SQL Server, MySQL
• Oracle – cx_Oracle
• Others – pymssql, _mssql, MySQLdb
• Execute SQL statements through a connection
Slide 82
Slide 82 text
Resources - FREE
• Dive into Python
• Python Cookbook
• Think Python
• Python docs
• gis.stackexchange.com
• Google is your friend (as always)
• Python community is HUGE and GIVING
Slide 83
Slide 83 text
Conferences
• pyArkansas – annually in Conway
– pyar2 list on python.org
• PyCon – THE national US Python conference
• FOSS4G – international open source for GIS
• ESRI Developer Summit – major dork-fest, but
great learning opportunity and Palm Springs in
March
Slide 84
Slide 84 text
IDEs and editors
• Wing – different license levels, good people
• PyScripter – open source, code completion
• Komodo – free version also available
• Notepad2 – ole’ standby editor
• Notepad++ - people swear by it
• PythonWin – another standby, but barebones
• …dozens (at least) more editors out there…
Slide 85
Slide 85 text
More reading
• http://www.voidspace.org.uk/python/articles/
OOP.shtml - great OOP article (which I used a
a lot)