Advanced geoprocessing with Python

Advanced geoprocessing with Python

A four-hour training course given at the Mid-America GIS Consortium meeting in 2012 in Kansas City, MO>

05d9c2908ca26febd6e0caa0f270f65b?s=128

Chad Cooper

April 23, 2012
Tweet

Transcript

  1. Advanced geoprocessing with… MAGIC 2012 Chad Cooper – chad@cast.uark.edu Center

    for Advanced Spatial Technologies University of Arkansas, Fayetteville
  2. Intros • your name • what you do/where you work

    • used Python much? – any formal training? – what do you use it for? • know any other languages?
  3. Objectives • informal class – expect tangents – code as

    we go • not geared totally to ArcGIS • THINK – oddball and out of the ordinary applications will make you want more…
  4. Outline • data types review • functions • procedural vs.

    OOP • geometries • rasters • spatial references • error handling/logging • documentation • 3rd party modules • module installation • the web –fetching –scraping –email –FTP • files
  5. Strings • ordered collections of characters • immutable – can’t

    change it • raw strings: • slicing • indexing: • iteration/membership
  6. Strings • string formatting: • useful string formatting:

  7. Lists • list – ordered collection of arbitrary objects •

    ordered • mutable – you can change it <- Extend concats lists
  8. Lists… • iterable – very important! • membership • nestable

    – 2D array/matrix • access by index – zero based
  9. Dictionaries • unordered collection of arbitrary objects • key/value pairs

    – think hash/lookup table (keys don’t have to be numbers) • nestable, mutable • access by key, not offset
  10. Dictionaries • iterable

  11. Tuples • ordered collection of arbitrary objects • immutable –

    cannot add, remove, find • access by offset • basically an unchangeable list • so what’s the purpose? – FAST – great for iterating over constant set of values – SAFE – you can’t change it
  12. List comprehensions • Map one list to another by applying

    a function to each of the list elements • Original list goes unchanged
  13. Sets • unordered collections of objects • like mathematical sets

    – collection of distinct objects – NO DUPLICATES • example – get rid of dups in a list via list comp
  14. Sets • get rid of dups via set: • union:

  15. Sets • intersection – data are the same • symmetrical

    difference – data are not the same • difference – data in first set but not second
  16. Programming paradigms: big blob of code • OK on a

    small scale for GP scripts • gets out of hand quickly • hard to follow • think ModelBuilder-exported code
  17. Programming paradigms: procedural programming • basically a list of instructions

    • program is built from one or more procedures (functions) – reusable chunks • procedures called at anytime, anywhere in program • focus is to break task into collection of variables, data structures, subroutines • natural style, easy to understand • strict separation between code and data
  18. Functions • portion of code within a larger program that

    performs a specific task • can be called anytime, anyplace • can accept arguments • should return a value • keeps code neat • promotes smooth flow
  19. Functions

  20. Programming paradigms: Procedural example

  21. Programming paradigms: Object-oriented programming (OOP) • break program down into

    data types (classes) that associate behavior (methods) with data (members or attributes) • code becomes more abstract • data and functions for dealing with it are bound together in one object
  22. Programming paradigms: Object-oriented programming (OOP)

  23. • objects let you wrap complex processes, but present a

    simple interface to them • methods and attributes are encapsulated inside the object • methods and attributes are exposed to users • you can then update the object without breaking the interface • you can pass objects around - carefully Programming paradigms: Object-oriented programming (OOP)
  24. Programming paradigms: OOP - Inheritance • classes can inherit attributes

    and methods • allows you to reuse and customize existing code inside a new class • you can override methods • you can add new methods to a class without modifying the existing class
  25. Programming paradigms: OOP - Inheritance

  26. Programming paradigms: OOP - Inheritance

  27. Modularizing code • I’m lazy, so I want to reuse

    code • statement – call functionality in another module • Have one custom module (a .py file) with code you use all the time • Great way to package up helper functions • ESRI does this with ConversionUtils.py C:\Program Files (x86)\ArcGIS\Server10.0\ArcToolBox\Scripts
  28. Geometries • heirarchy: – feature class is made of features

    – feature is made of parts – part is made of points • heirarchy in Pythonic terms: – part: – multipart polygon: – single part polygon with hole:
  29. Reading geometry • accessed through the geometry object of a

    feature • example: describe_geometry_arcmap.py 1.open up SearchCursor 2.loop through rows 3.get geometry 4.print out X, Y
  30. Reading geometry

  31. Reading geometry

  32. Reading geometry

  33. Writing geometry • • point features are point objects, lines

    and polygons are arrays of point objects – • Geometry objects can be created using the Geometry, Mulitpoint, PointGeometry, Polygon, or Polyline classes
  34. Writing geometry

  35. Writing geometry

  36. Rasters • class – raster object: variable that references a

    raster dataset – gives access to raster props • raster calculations – Map Algebra – – can cast to Raster object for calculations
  37. Rasters

  38. Spatial references • can get properties from • class •

    methods to create/edit spatial refs
  39. Spatial references • class • methods to create/edit spatial refs

  40. None
  41. Exception Handling • It’s necessary, stuff fails • Useful error

    reporting • Proper application cleanup • Combine it with logging
  42. Exception handling – try/except • most basic form of error

    handling • wrap whole program or portions of code • use optional clause for cleanup –close open files –close database connections –check extensions back in
  43. Exception handling

  44. Exception handling

  45. Exception handling - raise • allows you to force an

    exception to occur • can be used to alert of conditions
  46. Exception handling - raise

  47. Exception handling AddError and traceback • – returns GP-specific errors

    • – prints stack trace; determines precise location of error – good for larger, more complex programs
  48. Exception handling – AddError and traceback

  49. Logging • logging module • logging levels: – : detailed;

    for troubleshooting – : normal operation, statuses – : still working, but unexpected behavior – : more serious, some function not working – : program cannot continue
  50. Super-basic logging

  51. Super-basic logging to a log file

  52. Super-basic logging to a log file

  53. Meaningful logging • “customize” the logger • add in info-level

    message(s) to get logged • log our errors to log file • can get much more advanced, see the docs
  54. Meaningful logging

  55. Meaningful logging

  56. Code documentation • Pythonic standards covered in PEPs 8 and

    257 • help() • comments need to be worth it • name items well • be precise and compact • comments may be for you
  57. Creating documentation • – built-in; used by help() – generate

    HTML on any module – kinda plain • – old, rumored to be dead – produces nicely formatted HTML – easy to install and use • Sphinx framework – “intelligent and beautiful documentation” – all the cool kids are using it (docs.python.org) – more involved to setup and use
  58. Branching out

  59. Installing packages

  60. Installing packages (on Windows) • Windows executables • Python eggs

    – .zip file with metadata, renamed .egg – distributes code as a bundle – need easy_install • pip – tool for installing and managing Python packages – replacement for easy_install
  61. pip • can take care of dependencies for you •

    uninstallation! • install via , ironically –
  62. virtualenv • a tool to create isolated Python environments •

    manage dependencies on a per-project basis, rather than globally installing • test modules without installing into site- packages • avoid unintentional upgrades
  63. virtualenv • install via pip, easy_install, or by • create

    the env • activate the env • use the env
  64. virtualenv • installs Python where you tell it, modifies system

    path to point there – good only while the env is activated • use yolk to list installed packages in env • But can this work in ArcMap Python prompt?
  65. virtualenv • YES, with a little work... • tells ArcMap

    to use Python interpreter in our virtualenv – kill ArcMap, back to using default interpreter
  66. None
  67. The web • Infinite source of information • Right-click and

    “Save as” is so lame (and too much work) • Python can help you exploit the web – ftplib, http (urllib), mechanize, scraping (Beautiful Soup), send email (smtplib)
  68. Fetching data • Built-in libraries for ftp and http •

    ftplib – log in, nav to directory, retrieve files • urllib/urllib2 – pass in the url you want, get it back • wget – GNU commandline tool – Can call with os.system()
  69. Fetching data

  70. Scraping • Scrape data from a web page • Well-structured

    content is a HUGE help, as is valid markup, which isn’t always there • BeautifulSoup 3rd party module – Built in methods and regex’s help out – Great for getting at tables of data
  71. Scraping addresses http://www.phillypal.com/pal_locations.php

  72. Scraping addresses

  73. Scraping addresses

  74. Emailing • smtp built-in library • best if you have

    IP of your email server • port blocking can be an issue • there’s always Gmail too…
  75. Files • built in open function – slurp entire file

    into memory – OK except for huge files • iterate over the lines • CSV module
  76. None
  77. Excel • love, hate, love • many modules out there

    – xlrd (read) / xlwt (write) – only .xls – openPyXL – read/write .xlsx • uses – Push text data to Excel file – Push featureclass data to Excel programmatically – Read someone else’s “database”
  78. Reading Excel

  79. Writing Excel

  80. Writing Excel

  81. Databases • You can connect to pretty much ANY database

    • Is there one true solution?? • pyodbc – Access, SQL Server, MySQL • Oracle – cx_Oracle • Others – pymssql, _mssql, MySQLdb • Execute SQL statements through a connection
  82. Resources - FREE • Dive into Python • Python Cookbook

    • Think Python • Python docs • gis.stackexchange.com • Google is your friend (as always) • Python community is HUGE and GIVING
  83. Conferences • pyArkansas – annually in Conway – pyar2 list

    on python.org • PyCon – THE national US Python conference • FOSS4G – international open source for GIS • ESRI Developer Summit – major dork-fest, but great learning opportunity and Palm Springs in March
  84. IDEs and editors • Wing – different license levels, good

    people • PyScripter – open source, code completion • Komodo – free version also available • Notepad2 – ole’ standby editor • Notepad++ - people swear by it • PythonWin – another standby, but barebones • …dozens (at least) more editors out there…
  85. More reading • http://www.voidspace.org.uk/python/articles/ OOP.shtml - great OOP article (which

    I used a a lot)
  86. None