Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Advanced geoprocessing with Python

Advanced geoprocessing with Python

A four-hour training course given at the Mid-America GIS Consortium meeting in 2012 in Kansas City, MO>

Chad Cooper

April 23, 2012
Tweet

More Decks by Chad Cooper

Other Decks in Programming

Transcript

  1. Advanced geoprocessing with…
    MAGIC 2012
    Chad Cooper – [email protected]
    Center for Advanced Spatial Technologies
    University of Arkansas, Fayetteville

    View Slide

  2. Intros
    • your name
    • what you do/where you work
    • used Python much?
    – any formal training?
    – what do you use it for?
    • know any other languages?

    View Slide

  3. Objectives
    • informal class
    – expect tangents
    – code as we go
    • not geared totally to ArcGIS
    • THINK – oddball and out of the ordinary
    applications will make you want more…

    View Slide

  4. Outline
    • data types review
    • functions
    • procedural vs. OOP
    • geometries
    • rasters
    • spatial references
    • error
    handling/logging
    • documentation
    • 3rd party modules
    • module installation
    • the web
    –fetching
    –scraping
    –email
    –FTP
    • files

    View Slide

  5. Strings
    • ordered collections of characters
    • immutable – can’t change it
    • raw strings:
    • slicing
    • indexing:
    • iteration/membership

    View Slide

  6. Strings
    • string formatting:
    • useful string formatting:

    View Slide

  7. Lists
    • list – ordered collection of arbitrary objects
    • ordered
    • mutable – you can change it

    View Slide

  8. Lists…
    • iterable – very important!
    • membership
    • nestable – 2D array/matrix
    • access by index – zero based

    View Slide

  9. Dictionaries
    • unordered collection of arbitrary objects
    • key/value pairs – think hash/lookup table (keys
    don’t have to be numbers)
    • nestable, mutable
    • access by key, not offset

    View Slide

  10. Dictionaries
    • iterable

    View Slide

  11. Tuples
    • ordered collection of arbitrary objects
    • immutable – cannot add, remove, find
    • access by offset
    • basically an unchangeable list
    • so what’s the purpose?
    – FAST – great for iterating over constant set of
    values
    – SAFE – you can’t change it

    View Slide

  12. List comprehensions
    • Map one list to another by applying a function
    to each of the list elements
    • Original list goes unchanged

    View Slide

  13. Sets
    • unordered collections of objects
    • like mathematical sets – collection of distinct
    objects – NO DUPLICATES
    • example – get rid of dups in a list via list comp

    View Slide

  14. Sets
    • get rid of dups via set:
    • union:

    View Slide

  15. Sets
    • intersection – data are the same
    • symmetrical difference – data are not the same
    • difference – data in first set but not second

    View Slide

  16. Programming paradigms:
    big blob of code
    • OK on a small scale for GP scripts
    • gets out of hand quickly
    • hard to follow
    • think ModelBuilder-exported code

    View Slide

  17. Programming paradigms:
    procedural programming
    • basically a list of instructions
    • program is built from one or more procedures
    (functions) – reusable chunks
    • procedures called at anytime, anywhere in
    program
    • focus is to break task into collection of variables,
    data structures, subroutines
    • natural style, easy to understand
    • strict separation between code and data

    View Slide

  18. Functions
    • portion of code within a larger program that
    performs a specific task
    • can be called anytime, anyplace
    • can accept arguments
    • should return a value
    • keeps code neat
    • promotes smooth flow

    View Slide

  19. Functions

    View Slide

  20. Programming paradigms:
    Procedural example

    View Slide

  21. Programming paradigms:
    Object-oriented programming (OOP)
    • break program down into data types (classes)
    that associate behavior (methods) with data
    (members or attributes)
    • code becomes more abstract
    • data and functions for dealing with it are
    bound together in one object

    View Slide

  22. Programming paradigms:
    Object-oriented programming (OOP)

    View Slide

  23. • objects let you wrap complex processes, but
    present a simple interface to them
    • methods and attributes are encapsulated
    inside the object
    • methods and attributes are exposed to users
    • you can then update the object without
    breaking the interface
    • you can pass objects around - carefully
    Programming paradigms:
    Object-oriented programming (OOP)

    View Slide

  24. Programming paradigms:
    OOP - Inheritance
    • classes can inherit attributes and methods
    • allows you to reuse and customize existing
    code inside a new class
    • you can override methods
    • you can add new methods to a class without
    modifying the existing class

    View Slide

  25. Programming paradigms:
    OOP - Inheritance

    View Slide

  26. Programming paradigms:
    OOP - Inheritance

    View Slide

  27. Modularizing code
    • I’m lazy, so I want to reuse code
    • statement – call functionality in
    another module
    • Have one custom module (a .py file) with code
    you use all the time
    • Great way to package up helper functions
    • ESRI does this with ConversionUtils.py
    C:\Program Files (x86)\ArcGIS\Server10.0\ArcToolBox\Scripts

    View Slide

  28. Geometries
    • heirarchy:
    – feature class is made of features
    – feature is made of parts
    – part is made of points
    • heirarchy in Pythonic terms:
    – part:
    – multipart polygon:
    – single part polygon with hole:

    View Slide

  29. Reading geometry
    • accessed through the geometry object of a
    feature
    • example: describe_geometry_arcmap.py
    1.open up
    SearchCursor
    2.loop through
    rows
    3.get geometry
    4.print out X, Y

    View Slide

  30. Reading geometry

    View Slide

  31. Reading geometry

    View Slide

  32. Reading geometry

    View Slide

  33. Writing geometry

    • point features are point objects, lines and
    polygons are arrays of point objects

    • Geometry objects can be created using the
    Geometry, Mulitpoint, PointGeometry,
    Polygon, or Polyline classes

    View Slide

  34. Writing geometry

    View Slide

  35. Writing geometry

    View Slide

  36. Rasters
    • class
    – raster object: variable that references a raster
    dataset
    – gives access to raster props
    • raster calculations – Map Algebra

    – can cast to Raster object for calculations

    View Slide

  37. Rasters

    View Slide

  38. Spatial references
    • can get properties from
    • class
    • methods to create/edit spatial refs

    View Slide

  39. Spatial references
    • class
    • methods to create/edit spatial refs

    View Slide

  40. View Slide

  41. Exception Handling
    • It’s necessary, stuff fails
    • Useful error reporting
    • Proper application cleanup
    • Combine it with logging

    View Slide

  42. Exception handling – try/except
    • most basic form of error handling
    • wrap whole program or portions of code
    • use optional clause for cleanup
    –close open files
    –close database connections
    –check extensions back in

    View Slide

  43. Exception handling

    View Slide

  44. Exception handling

    View Slide

  45. Exception handling - raise
    • allows you to force an exception to occur
    • can be used to alert of conditions

    View Slide

  46. Exception handling - raise

    View Slide

  47. Exception handling
    AddError and traceback
    • – returns GP-specific errors
    • – prints stack trace; determines
    precise location of error
    – good for larger, more complex programs

    View Slide

  48. Exception handling –
    AddError and traceback

    View Slide

  49. Logging
    • logging module
    • logging levels:
    – : detailed; for troubleshooting
    – : normal operation, statuses
    – : still working, but unexpected behavior
    – : more serious, some function not working
    – : program cannot continue

    View Slide

  50. Super-basic logging

    View Slide

  51. Super-basic logging to a log file

    View Slide

  52. Super-basic logging to a log file

    View Slide

  53. Meaningful logging
    • “customize” the logger
    • add in info-level message(s) to get logged
    • log our errors to log file
    • can get much more advanced, see the docs

    View Slide

  54. Meaningful logging

    View Slide

  55. Meaningful logging

    View Slide

  56. Code documentation
    • Pythonic standards covered in PEPs 8 and 257
    • help()
    • comments need to be worth it
    • name items well
    • be precise and compact
    • comments may be for you

    View Slide

  57. Creating documentation
    • – built-in; used by help()
    – generate HTML on any module
    – kinda plain
    • – old, rumored to be dead
    – produces nicely formatted HTML
    – easy to install and use
    • Sphinx framework
    – “intelligent and beautiful documentation”
    – all the cool kids are using it (docs.python.org)
    – more involved to setup and use

    View Slide

  58. Branching out

    View Slide

  59. Installing packages

    View Slide

  60. Installing packages (on Windows)
    • Windows executables
    • Python eggs
    – .zip file with metadata, renamed .egg
    – distributes code as a bundle
    – need easy_install
    • pip
    – tool for installing and managing Python packages
    – replacement for easy_install

    View Slide

  61. pip
    • can take care of dependencies for you
    • uninstallation!
    • install via , ironically

    View Slide

  62. virtualenv
    • a tool to create isolated Python environments
    • manage dependencies on a per-project basis,
    rather than globally installing
    • test modules without installing into site-
    packages
    • avoid unintentional upgrades

    View Slide

  63. virtualenv
    • install via pip, easy_install, or by
    • create the env
    • activate the env
    • use the env

    View Slide

  64. virtualenv
    • installs Python where you tell it, modifies
    system path to point there
    – good only while the env is activated
    • use yolk to list installed packages in env
    • But can this work in ArcMap Python prompt?

    View Slide

  65. virtualenv
    • YES, with a little work...
    • tells ArcMap to use Python interpreter in our
    virtualenv
    – kill ArcMap, back to using default interpreter

    View Slide

  66. View Slide

  67. The web
    • Infinite source of information
    • Right-click and “Save as” is so lame (and too
    much work)
    • Python can help you exploit the web
    – ftplib, http (urllib), mechanize, scraping (Beautiful
    Soup), send email (smtplib)

    View Slide

  68. Fetching data
    • Built-in libraries for ftp and http
    • ftplib – log in, nav to directory, retrieve files
    • urllib/urllib2 – pass in the url you want, get it
    back
    • wget – GNU commandline tool
    – Can call with os.system()

    View Slide

  69. Fetching data

    View Slide

  70. Scraping
    • Scrape data from a web page
    • Well-structured content is a HUGE help, as is
    valid markup, which isn’t always there
    • BeautifulSoup 3rd party module
    – Built in methods and regex’s help out
    – Great for getting at tables of data

    View Slide

  71. Scraping addresses
    http://www.phillypal.com/pal_locations.php

    View Slide

  72. Scraping addresses

    View Slide

  73. Scraping addresses

    View Slide

  74. Emailing
    • smtp built-in library
    • best if you have IP of your email server
    • port blocking can be an issue
    • there’s always Gmail too…

    View Slide

  75. Files
    • built in open function – slurp entire file into
    memory – OK except for huge files
    • iterate over the lines
    • CSV module

    View Slide

  76. View Slide

  77. Excel
    • love, hate, love
    • many modules out there
    – xlrd (read) / xlwt (write) – only .xls
    – openPyXL – read/write .xlsx
    • uses
    – Push text data to Excel file
    – Push featureclass data to Excel programmatically
    – Read someone else’s “database”

    View Slide

  78. Reading Excel

    View Slide

  79. Writing Excel

    View Slide

  80. Writing Excel

    View Slide

  81. Databases
    • You can connect to pretty much ANY database
    • Is there one true solution??
    • pyodbc – Access, SQL Server, MySQL
    • Oracle – cx_Oracle
    • Others – pymssql, _mssql, MySQLdb
    • Execute SQL statements through a connection

    View Slide

  82. Resources - FREE
    • Dive into Python
    • Python Cookbook
    • Think Python
    • Python docs
    • gis.stackexchange.com
    • Google is your friend (as always)
    • Python community is HUGE and GIVING

    View Slide

  83. Conferences
    • pyArkansas – annually in Conway
    – pyar2 list on python.org
    • PyCon – THE national US Python conference
    • FOSS4G – international open source for GIS
    • ESRI Developer Summit – major dork-fest, but
    great learning opportunity and Palm Springs in
    March

    View Slide

  84. IDEs and editors
    • Wing – different license levels, good people
    • PyScripter – open source, code completion
    • Komodo – free version also available
    • Notepad2 – ole’ standby editor
    • Notepad++ - people swear by it
    • PythonWin – another standby, but barebones
    • …dozens (at least) more editors out there…

    View Slide

  85. More reading
    • http://www.voidspace.org.uk/python/articles/
    OOP.shtml - great OOP article (which I used a
    a lot)

    View Slide

  86. View Slide