▸ What is Civil Engineering? ▸ Structural ▸ Transportation ▸ Construction Management ▸ Geotechnical ▸ Environmental ▸ Water and Natural Resources Thesis Title: Rheologic and Flume Erosion Characteristics of Georgia Sediments at Bridge Piers
▸ Help industry with compliance and remediation ▸ Litigation support for allocation of environmental damages ▸ Grant-funded research with NCHRP, FHWA, WERF
▸ Build spatial databases of contaminant concentrations ▸ Scrape the web (brute force) to meteorologic and hydrologic data ▸ Statistical analysis of pollution mitigation and remediation systems ▸ Build inputs for, run, and analyze output from vetted numerical models ▸ Build tools to help my colleagues do the same
TALK ▸ My take on the general state of the practice in engineering consulting ▸ Characteristic challenges of environmental consulting ▸ Tools I like to use ▸ Tools I have built over the course of my career
▸ Overwhelming everything is in MS Access ▸ PROTIP: Your MS Office, MS Access Drivers, and python all must have the same architecture to work together ▸ Last Client’s DB ▸ 110 tables ▸ 200 saved insert/create/update ▸ 100 save select and pivot queries ▸ 25 queries that deleted stuff ▸ Everything connected with 40 VBA forms, 15 utility modules (9000 LOC)
▸ Excel is the behemoth that cannot be stopped ▸ Most people manage their data with it ▸ No one follows best practices (www.datacarpentry.org/ spreadsheet-ecology-lesson) ▸ Will be pried from cold dead hands, despite well known statistical errors (recently resolved, IIRC)
TOOLS ▸ GUI or cmd-line based numerical models maintained by federal agencies ▸ SWMM ▸ 1-D urban hydrology ▸ Written in very legible C ▸ Open source ▸ Maintained by EPA, transitioning to UT Austin ▸ Compiles on Linux and Windows ▸ HEC-RAS ▸ 2-D River hydraulics ▸ Completely closed source ▸ Guarded over by US Army Corps of Engineers ▸ EFDC ▸ Spaghetti FORTRAN ▸ Compiles on Linux but results are garbage ▸ 3-D river hydraulics, sediment and pollutant fate and transport ▸ Very optimized: 1 yr simulation takes ~ 1 day
SPECIFIC TOOLS ▸ All (numerically) solve the St. Venant Equations ▸ Differential equations in time and space ▸ Very difficult to even work with symbolically ▸ Large code bases ▸ Represent a significant intellectual investment from civil/ environmental community
OF THIS? ▸ Scrape model inputs from web ▸ Hack input files for batch processing ▸ Move data around between formats ▸ General data and model results analysis ▸ (Rarely) Wrap C/Fortran libraries with ctypes/cython/ numpy
Fast numeric arrays implemented in C ▸ Specialized scientific functions ▸ Matplotlib/seaborn: generic and statistical 2-D visualizations ▸ pandas/statsmodels: table-like data structures and statistical models ▸ Jupyter: interactive computing via notebooks ▸ conda: sane installation of python packages on Windows and other operating systems
bulk data download and parsing ▸ Creates a directory structure for each station, source, and stage of processing ▸ searches for existing files before attempting to download or parse ▸ High-level plotting functions
conda install --channel=phobson wqio ▸ Centered around examining the efficacy of so-called BMPs ▸ Handles left-censored data with ROS ▸ No such things as having “zero” pollution ▸ Can only say that things are “less than” the instrument’s precision ▸ Bias-corrected, accelerated non-parametric bootstrapping to estimate confidence intervals around statistics ▸ High-level plotting interface built on seaborn ▸ Basis for client-specific libraries that I distribute to them through conda
▸ http://phobson.github.io/paramnormal/index.html ▸ conda install --channel=phobson paramnormal ▸ Problem: scipy distributions are incredibly flexible, but also perhaps a bit over generalized ▸ Wanted to create an API that let statisticians of all caliber specify and fit distributions using the parameters they read about in text books. ▸ The activity module provides a simple interface for creating, fitting, and plotting statistical distributions
matplotlib ▸ Similar to a quantile plot, but expressed as a probability instead of a z-score ▸ Simply import probscale and you’re set import probscale ax1.set_xscale('prob') ax1.set_xlim(left=2, right=98) ax1.set_xlabel('Normal probability scale') ax2.set_xscale('prob', dist=paramnormal.beta(α=3, β=2)) ax2.set_xlim(left=2, right=98) ax2.set_xlabel('Beta probability scale (α=3, β=2)')
grids are fairly difficult to compute — lots of very expensive proprietary stuff out there. ▸ Dr. Pavel Sakov (Australian Bureau of Meteorology) wrote gridgen-c ▸ Built a types interface with help from: ▸ Dr. Robert Hetland (Texas A&M) ▸ Dr. Richard Signell (USGS) ▸ Mac and Linux binaries available through conda ▸ Docs at: phobson.github.io/pygridgen
install --channel=conda-forge pygridtools ▸ geosyntec.github.io/pygridtools ▸ Interactivity with grid generation parameters ▸ Simple, general manipulations methods (merge, split, refine, transform) ▸ Took inspiration from pandas/xarray/seaborn ▸ File IO with general and GIS data formats via fiona
MEAT-SPACE ENGINEER ▸ I was lucky enough to get started with numpy and matplotlib while: ▸ I was still very familiar with MATLAB ▸ Both projects were directly trying to take market share away from MATLAB ▸ APIs and efforts have changed since then. ▸ More recently, a new employee started with MATLAB experience from college. We gave them a copy of Python For Data Analysis by Wes McKinney (author of pandas), and were pretty productive with python in a couple of week.