Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Advanced geoprocessing with Python

Advanced geoprocessing with Python

A four-hour training course given at the Mid-America GIS Consortium meeting in 2012 in Kansas City, MO>

Chad Cooper

April 23, 2012
Tweet

More Decks by Chad Cooper

Other Decks in Programming

Transcript

  1. Advanced geoprocessing with… MAGIC 2012 Chad Cooper – [email protected] Center

    for Advanced Spatial Technologies University of Arkansas, Fayetteville
  2. Intros • your name • what you do/where you work

    • used Python much? – any formal training? – what do you use it for? • know any other languages?
  3. Objectives • informal class – expect tangents – code as

    we go • not geared totally to ArcGIS • THINK – oddball and out of the ordinary applications will make you want more…
  4. Outline • data types review • functions • procedural vs.

    OOP • geometries • rasters • spatial references • error handling/logging • documentation • 3rd party modules • module installation • the web –fetching –scraping –email –FTP • files
  5. Strings • ordered collections of characters • immutable – can’t

    change it • raw strings: • slicing • indexing: • iteration/membership
  6. Lists • list – ordered collection of arbitrary objects •

    ordered • mutable – you can change it <- Extend concats lists
  7. Lists… • iterable – very important! • membership • nestable

    – 2D array/matrix • access by index – zero based
  8. Dictionaries • unordered collection of arbitrary objects • key/value pairs

    – think hash/lookup table (keys don’t have to be numbers) • nestable, mutable • access by key, not offset
  9. Tuples • ordered collection of arbitrary objects • immutable –

    cannot add, remove, find • access by offset • basically an unchangeable list • so what’s the purpose? – FAST – great for iterating over constant set of values – SAFE – you can’t change it
  10. List comprehensions • Map one list to another by applying

    a function to each of the list elements • Original list goes unchanged
  11. Sets • unordered collections of objects • like mathematical sets

    – collection of distinct objects – NO DUPLICATES • example – get rid of dups in a list via list comp
  12. Sets • intersection – data are the same • symmetrical

    difference – data are not the same • difference – data in first set but not second
  13. Programming paradigms: big blob of code • OK on a

    small scale for GP scripts • gets out of hand quickly • hard to follow • think ModelBuilder-exported code
  14. Programming paradigms: procedural programming • basically a list of instructions

    • program is built from one or more procedures (functions) – reusable chunks • procedures called at anytime, anywhere in program • focus is to break task into collection of variables, data structures, subroutines • natural style, easy to understand • strict separation between code and data
  15. Functions • portion of code within a larger program that

    performs a specific task • can be called anytime, anyplace • can accept arguments • should return a value • keeps code neat • promotes smooth flow
  16. Programming paradigms: Object-oriented programming (OOP) • break program down into

    data types (classes) that associate behavior (methods) with data (members or attributes) • code becomes more abstract • data and functions for dealing with it are bound together in one object
  17. • objects let you wrap complex processes, but present a

    simple interface to them • methods and attributes are encapsulated inside the object • methods and attributes are exposed to users • you can then update the object without breaking the interface • you can pass objects around - carefully Programming paradigms: Object-oriented programming (OOP)
  18. Programming paradigms: OOP - Inheritance • classes can inherit attributes

    and methods • allows you to reuse and customize existing code inside a new class • you can override methods • you can add new methods to a class without modifying the existing class
  19. Modularizing code • I’m lazy, so I want to reuse

    code • statement – call functionality in another module • Have one custom module (a .py file) with code you use all the time • Great way to package up helper functions • ESRI does this with ConversionUtils.py C:\Program Files (x86)\ArcGIS\Server10.0\ArcToolBox\Scripts
  20. Geometries • heirarchy: – feature class is made of features

    – feature is made of parts – part is made of points • heirarchy in Pythonic terms: – part: – multipart polygon: – single part polygon with hole:
  21. Reading geometry • accessed through the geometry object of a

    feature • example: describe_geometry_arcmap.py 1.open up SearchCursor 2.loop through rows 3.get geometry 4.print out X, Y
  22. Writing geometry • • point features are point objects, lines

    and polygons are arrays of point objects – • Geometry objects can be created using the Geometry, Mulitpoint, PointGeometry, Polygon, or Polyline classes
  23. Rasters • class – raster object: variable that references a

    raster dataset – gives access to raster props • raster calculations – Map Algebra – – can cast to Raster object for calculations
  24. Exception Handling • It’s necessary, stuff fails • Useful error

    reporting • Proper application cleanup • Combine it with logging
  25. Exception handling – try/except • most basic form of error

    handling • wrap whole program or portions of code • use optional clause for cleanup –close open files –close database connections –check extensions back in
  26. Exception handling - raise • allows you to force an

    exception to occur • can be used to alert of conditions
  27. Exception handling AddError and traceback • – returns GP-specific errors

    • – prints stack trace; determines precise location of error – good for larger, more complex programs
  28. Logging • logging module • logging levels: – : detailed;

    for troubleshooting – : normal operation, statuses – : still working, but unexpected behavior – : more serious, some function not working – : program cannot continue
  29. Meaningful logging • “customize” the logger • add in info-level

    message(s) to get logged • log our errors to log file • can get much more advanced, see the docs
  30. Code documentation • Pythonic standards covered in PEPs 8 and

    257 • help() • comments need to be worth it • name items well • be precise and compact • comments may be for you
  31. Creating documentation • – built-in; used by help() – generate

    HTML on any module – kinda plain • – old, rumored to be dead – produces nicely formatted HTML – easy to install and use • Sphinx framework – “intelligent and beautiful documentation” – all the cool kids are using it (docs.python.org) – more involved to setup and use
  32. Installing packages (on Windows) • Windows executables • Python eggs

    – .zip file with metadata, renamed .egg – distributes code as a bundle – need easy_install • pip – tool for installing and managing Python packages – replacement for easy_install
  33. pip • can take care of dependencies for you •

    uninstallation! • install via , ironically –
  34. virtualenv • a tool to create isolated Python environments •

    manage dependencies on a per-project basis, rather than globally installing • test modules without installing into site- packages • avoid unintentional upgrades
  35. virtualenv • install via pip, easy_install, or by • create

    the env • activate the env • use the env
  36. virtualenv • installs Python where you tell it, modifies system

    path to point there – good only while the env is activated • use yolk to list installed packages in env • But can this work in ArcMap Python prompt?
  37. virtualenv • YES, with a little work... • tells ArcMap

    to use Python interpreter in our virtualenv – kill ArcMap, back to using default interpreter
  38. The web • Infinite source of information • Right-click and

    “Save as” is so lame (and too much work) • Python can help you exploit the web – ftplib, http (urllib), mechanize, scraping (Beautiful Soup), send email (smtplib)
  39. Fetching data • Built-in libraries for ftp and http •

    ftplib – log in, nav to directory, retrieve files • urllib/urllib2 – pass in the url you want, get it back • wget – GNU commandline tool – Can call with os.system()
  40. Scraping • Scrape data from a web page • Well-structured

    content is a HUGE help, as is valid markup, which isn’t always there • BeautifulSoup 3rd party module – Built in methods and regex’s help out – Great for getting at tables of data
  41. Emailing • smtp built-in library • best if you have

    IP of your email server • port blocking can be an issue • there’s always Gmail too…
  42. Files • built in open function – slurp entire file

    into memory – OK except for huge files • iterate over the lines • CSV module
  43. Excel • love, hate, love • many modules out there

    – xlrd (read) / xlwt (write) – only .xls – openPyXL – read/write .xlsx • uses – Push text data to Excel file – Push featureclass data to Excel programmatically – Read someone else’s “database”
  44. Databases • You can connect to pretty much ANY database

    • Is there one true solution?? • pyodbc – Access, SQL Server, MySQL • Oracle – cx_Oracle • Others – pymssql, _mssql, MySQLdb • Execute SQL statements through a connection
  45. Resources - FREE • Dive into Python • Python Cookbook

    • Think Python • Python docs • gis.stackexchange.com • Google is your friend (as always) • Python community is HUGE and GIVING
  46. Conferences • pyArkansas – annually in Conway – pyar2 list

    on python.org • PyCon – THE national US Python conference • FOSS4G – international open source for GIS • ESRI Developer Summit – major dork-fest, but great learning opportunity and Palm Springs in March
  47. IDEs and editors • Wing – different license levels, good

    people • PyScripter – open source, code completion • Komodo – free version also available • Notepad2 – ole’ standby editor • Notepad++ - people swear by it • PythonWin – another standby, but barebones • …dozens (at least) more editors out there…