Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time Savings with Python

ATX GIS Day
November 13, 2019

Time Savings with Python

Sesha Choutagunta, Graphic Design Consultant, NTT Data

ATX GIS Day

November 13, 2019
Tweet

More Decks by ATX GIS Day

Other Decks in Technology

Transcript

  1. • Strava is a social fitness network that is primarily

    used to track cycling and running exercises using GPS data. It’s a crowd-sourced data collected by bicyclists and runners using a smartphone app. • TxDOT-PTN acquired statewide Strava data in coordination with TxDOT-PTN’s bike/ped counting research project
  2. Beating the clock Problem: Every quarter of the year, I

    am tasked to download new Bike and Pedestrian data from STRAVA, extract, and perform quality check. Later, delete redundant data in the form of shapefiles and related tables retaining geodatabases and rename them with its respective parent folder. This process may or may not sound complicated but definitely time consuming! Is there a way to save time and automate this process? Solution: Yes! Python is the answer. We can expedite this process and minimize the amount of manual interaction with python program.
  3. Before: Prior to writing a python program, I had to

    go through the following steps manually: 1. Download data from Strava’s FTP site to TxDOT network drive (I:), 2. Extract data from I: drive to another J: drive so, data on the I: drive is kept as a backup of original downloaded data. 3. The extracted data on J: drive is then checked for any corrupt files. 4. The data comes in two different formats, shapefiles and file geodatabases. Since the data is same in both the formats, and all users in our organization have access to ArcGIS software, we need to retain geodatabases with feature classes and tables and delete shapefiles and related csv and dbf files to save space on the network. And finally, rename the geodatabases with its respective parent folder. Now, it definitely sounded complicated and still time consuming… Also, need constant manual interaction and monitoring.
  4. Step 1 Script tool to download Strava quarterly data for

    TxDOT 25 Districts and Statewide on to network, I: drive. Step 2 Extract data from I: drive to J: drive. Folder names with district abbreviation on I: drive should be extracted with district full name folders on J: drive. Step 3 Step 4 The extracted data on J: drive is then checked for any corrupt files. Delete directory with shapefiles to save space on the network. Then, rename geodatabases with its respective parent folder. After: Manage time wisely and eliminate human errors by working on these steps through a series of python script tools.
  5. Script tool with input parameters Output in windows explorer Geoprocessing

    results Step 1: Python script to download data from FTP site
  6. Step 1: Python script to download data from FTP site

    1. Import required modules. 2. Set input variables… using arcpy module. 3. Create a function to download a single file from FTP site. 4. Access FTP Host server using ftplib module. 5. Get a list of all district folders and files from the source FTP site and create the same folder structure at the destination location. 6. Loop through the list of source folders/files and download each file by calling the function that we created to download a single file.
  7. Script tool with input parameters Output in windows explorer Geoprocessing

    results Step 2: Extract data from I: drive on to J: drive
  8. Step 2: Extract data from I: drive on to J:

    drive 1. Import required modules. 2. Set input variables using arcpy module. 3. Create a dictionary with key as source district abbreviation and set value as destination district full name. 4. Using os.walk command, search through source district folders for any .tgz files, get the parent folder name and create corresponding destination district folder from the dictionary. 5. Then extract the .tgz files under its respective district folder.
  9. Script tool with input parameters Geoprocessing results Record count in

    ArcCatalog Step 3: The extracted data on J: drive is checked for any corrupt files.
  10. Step 3: The extracted data on J: drive is checked

    for any corrupt files. 1. Import required modules 2. Set input variables using arcpy module 3. Check data by opening files in ArcCatalog and get record count from each file. Using os.walk command, loop through each directory and get a list of existing files. 4. Then open each file and get record count of each related table. Using try and except method, capture any error messages and get a count of corrupt files. 5. Finally, if the count for corrupt files equals to zero, confirm that all files passed QC.
  11. Script tool with input parameters Geoprocessing results Output in ArcCatalog

    Step 4: Delete folders with shapefiles and rename geodatabases as its parent folder.
  12. Step 4: Delete directories with shapefiles and rename geodatabases as

    its parent folder. 1. Import required modules 2. Set input variables using arcpy module. 3. Using os.walk command, search for directories or folders with names, “Edges”, “Nodes”, or “OD” and using shutil.rmtree command, delete those folders with shapefiles, csv and dbf tables. 4. While still inside the for loop, find parent directory name of data.gdb and rename it with parent directory.