Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python for GIS...and then some

Chad Cooper
September 01, 2011

Python for GIS...and then some

Intermediate Python for GIS course taught at the 2011 Arkansas GIS Users Forum meeting, Bentonville, AR

Chad Cooper

September 01, 2011
Tweet

More Decks by Chad Cooper

Other Decks in Programming

Transcript

  1. Python for GIS…and then some 2011 AR GIS User’s Forum

    Conference Chad Cooper Center for Advanced Spatial Technologies University of Arkansas, Fayetteville
  2. Intros • Name • What you do/where you work •

    Used Python much? – Any formal training? – What do you use it for? • Know any other languages?
  3. Objectives • Informal class – Ask questions, you will stump

    me, but we will find an answer – Expect tangents • NOT geared totally to ArcGIS • Let’s cover some important basics • Python for accomplishing other tasks • THINK – oddball and out of the ordinary applications will make you want more…
  4. (VERY Rough) Outline • Strings/operations • Lists, dictionaries, tuples, sets

    • File input/output • The web – Fetching – Scraping – Email – FTP • Regular expressions • Logging • Excel • Exception handling • ArcGIS • Databases • Resources • SWAG!
  5. Strings • Ordered collections of characters • Immutable • Raw

    strings: path = r”C:\temp\chad\” • Slicing fruit[0] ‘b’ • Indexing: fruit[1:3] >> ‘an’ • Iteration/membership: for each in fruit ‘f’ in fruit • String formatting: ‘a %s parrot’ % ‘dead’ ‘a dead parrot’
  6. Lists • List – ordered collection of arbitrary objects list1

    = [0,1,2,3] list2 = ['zero','one','two','three'] list3 = [0,'zero',1,'one',2,'two',3,'three'] • Ordered list2.sort() list2.sort(reverse=True) ['one','three',...] ['zero','two',...] • Mutable – you can change it list1.append(4) list1.reverse() list2.insert(0,’one-half’) [0,1,2,3,4] [4,3,2,1,0] [‘one-half’,’zero’…] list2.extend([‘four’,’five’]) <- Extend concats lists
  7. Lists… • Iterable – very important! for l in list3

    0 zero ... • Membership 3 in list3 --> True • Nestable – 2D array/matrix list4 = [[0,1,2], [3,4,5], [6,7,8]] • Access by index – zero based list4[1] list4[1][2] [3,4,5] 5
  8. Dictionaries • Unordered collection of arbitrary objects d = {1:’foo’,

    2:’bar’} • Key/value pairs – think hash/lookup table (keys don’t have to be numbers) d.keys() d.values() [1, 2] [‘foo’,’bar’] • Nestable, mutable d[3] = ‘spam’ del d[key] • Access by key, not offset d[2] >> ‘bar’
  9. Tuples • Ordered collection of arbitrary objects • Immutable –

    cannot add, remove, find • Access by offset • Basically an unchangeable list (1,2,’three’,4,…) • So what’s the purpose? – FAST – great for iterating over constant set of values – SAFE – you can’t change it
  10. Sets • Unordered collections of objects • Like mathematical sets

    – collection of distinct objects – NO DUPLICATES • Example – get rid of dups in a list L1=[2,2,3,4,5,5,3] L2=[] [L2.append(x) for x in L1 if x not in L2] >>> L2 [2, 3, 4, 5]
  11. List comprehensions • Map one list to another by applying

    a function to each of the list elements • Original list goes unchanged L = [2,4,6,8] J = [elem * 2 for elem in L] >>> J [4, 8, 12, 16]
  12. Files • Built in open function – slurp entire file

    into memory – OK except for huge files data = open(file).read().splitlines() • Iterate over the lines for line in data: do something • CSV module reader = csv.reader(open('C:/file.csv','rb')) for line in reader: do something
  13. Exercise 1 • Work with csv file (csv module) –

    C:\temp\python\simple-csv.csv • Read into memory (create reader, open file) • Print it out, slice it up, use indexes • Put contents into a dictionary (zip module) • Put dictionary items into a list (list.append(dictionary item) • Exercise-1.py and Exercise-1B_Write_Text_File.py
  14. The web • Infinite source of information • Right-click and

    “Save as” is so lame • Python can help you exploit the web – ftplib, http (urllib), mechanize, scraping (Beautiful Soup), send email (smtplib)
  15. Fetching data • Built-in libraries for ftp and http •

    ftplib – log in, nav to directory, retrieve files • urllib/urllib2 – pass in the url you want, get it back • wget – GNU commandline tool – Can call with os.system()
  16. Scraping • Scrape data from a web page • Well-structured

    content is a HUGE help, as is valid markup, which isn’t always there • BeautifulSoup 3rd party module – Built in methods and regex’s help out – Great for getting at tables of data
  17. Emailing • smtp built-in library • Best is you have

    IP of your email server • Port blocking can be an issue import smtplib server = smtplib.SMTP(email_server_ip) msg = ‘All TPS reports need new cover sheets’ server.sendmail('[email protected]', '[email protected]', msg) server.quit() • There’s always Gmail too…
  18. Exercise 2 • Go over a FTP example together •

    Fetch some data from the web using urllib • Go to the AR GIS User’s Forum site and pull down the conference program pdf (urllib) • Exercise-2.py • BS_Scrape.py • Fetching_Data_Example.py • Fetching_Get_DRGs_Example.py
  19. Regular Expressions • Powerful, standardized searching, replacing, and parsing of

    text with complex patterns of characters • An incredibly complex topic • Simple ones can be sooooo helpful • re module in standard library * Patience required
  20. Modularizing code • I’m lazy, so I want to reuse

    code • import statement – call functionality in another module • Have one custom module (a .py file) with code you use all the time • Great way to package up helper functions • ESRI does this with ConversionUtils.py C:\Program Files (x86)\ArcGIS\Server10.0\ArcToolBox\Scripts
  21. Excel • Love, hate, love • Many modules out there

    – xlrd (read) / xlwt (write) – only .xls – openPyXL – read/write .xlsx • Uses – Push text data to Excel file – Push featureclass data to Excel programmatically – Read someone else’s “database”
  22. Exception Handling • It’s necessary • Useful error reporting •

    Proper application cleanup • Combine with logging try: do something... except: handle error... finally: clean up...
  23. Logging • Log files can save you • Most code

    runs in background, so you get no console output • Great for timing processing and debugging • Append or write • Environment: dev/test/prod • Two options: – logging module – Just write out to text file
  24. Databases • You can connect to pretty much ANY database

    • Is there one true solution?? • pyodbc – Access, SQL Server, MySQL • Oracle – cx_Oracle • Others – pymssql, _mssql, MySQLdb • Execute SQL statements through a connection conn = library.connect(driver/user/pwd) cursor = conn.cursor() ror row in cursor.execute(sql) …do something
  25. ArcGIS • Python support continues to improve – Python support

    for Label Expressions in 10.1!!! • ArcPy – rich native Python site-package – Successor to arcgisscripting • Organized in tools, functions, classes, modules • Very well documented, but daunting • Must have Python 2.6! At least according to ESRI…
  26. Exercise • Use what we have learned • Nasty National

    Bridge Inventory data – Fetch it – Parse it – Process it – Push to file geodatabase table – Push to file geodatabase featureclass • NBI_Data_Processing.py
  27. Resources - FREE • Dive into Python • Python Cookbook

    • Think Python • Python docs • gis.stackexchange.com • Google is your friend (as always) • Python community is HUGE and GIVING
  28. Conferences • pyArkansas – October 22, UCA Conway • PyCon

    – THE national US Python conference • FOSS4G – international open source for GIS • ESRI Developer Summit – major dork-fest, but great learning opportunity and Palm Springs in March
  29. IDEs and editors • Wing – different license levels, good

    people • Komodo – free version also available • Notepad2 – ole’ standby editor • Notepad++ - people swear by it • PythonWin – another standby • …dozens (at least) more editors out there…
  30. Other fun stuff • Geopy • APIs – Flickr –

    Google • XML processing • SWAG