Time Savings with Python

Time Savings With Python November 13, 2019 Sesha Sailaja Choutagunta
Graphic Design Consultant, NTT Data

• Strava is a social fitness network that is primarily
used to track cycling and running exercises using GPS data. It’s a crowd-sourced data collected by bicyclists and runners using a smartphone app. • TxDOT-PTN acquired statewide Strava data in coordination with TxDOT-PTN’s bike/ped counting research project

Beating the clock Problem: Every quarter of the year, I
am tasked to download new Bike and Pedestrian data from STRAVA, extract, and perform quality check. Later, delete redundant data in the form of shapefiles and related tables retaining geodatabases and rename them with its respective parent folder. This process may or may not sound complicated but definitely time consuming! Is there a way to save time and automate this process? Solution: Yes! Python is the answer. We can expedite this process and minimize the amount of manual interaction with python program.

Before: Prior to writing a python program, I had to
go through the following steps manually: 1. Download data from Strava’s FTP site to TxDOT network drive (I:), 2. Extract data from I: drive to another J: drive so, data on the I: drive is kept as a backup of original downloaded data. 3. The extracted data on J: drive is then checked for any corrupt files. 4. The data comes in two different formats, shapefiles and file geodatabases. Since the data is same in both the formats, and all users in our organization have access to ArcGIS software, we need to retain geodatabases with feature classes and tables and delete shapefiles and related csv and dbf files to save space on the network. And finally, rename the geodatabases with its respective parent folder. Now, it definitely sounded complicated and still time consuming… Also, need constant manual interaction and monitoring.

View from ArcCatalog

Step 1 Script tool to download Strava quarterly data for
TxDOT 25 Districts and Statewide on to network, I: drive. Step 2 Extract data from I: drive to J: drive. Folder names with district abbreviation on I: drive should be extracted with district full name folders on J: drive. Step 3 Step 4 The extracted data on J: drive is then checked for any corrupt files. Delete directory with shapefiles to save space on the network. Then, rename geodatabases with its respective parent folder. After: Manage time wisely and eliminate human errors by working on these steps through a series of python script tools.

Script tool with input parameters Output in windows explorer Geoprocessing
results Step 1: Python script to download data from FTP site

Step 1: Python script to download data from FTP site
1. Import required modules. 2. Set input variables… using arcpy module. 3. Create a function to download a single file from FTP site. 4. Access FTP Host server using ftplib module. 5. Get a list of all district folders and files from the source FTP site and create the same folder structure at the destination location. 6. Loop through the list of source folders/files and download each file by calling the function that we created to download a single file.

Script tool with input parameters Output in windows explorer Geoprocessing
results Step 2: Extract data from I: drive on to J: drive

Step 2: Extract data from I: drive on to J:
drive 1. Import required modules. 2. Set input variables using arcpy module. 3. Create a dictionary with key as source district abbreviation and set value as destination district full name. 4. Using os.walk command, search through source district folders for any .tgz files, get the parent folder name and create corresponding destination district folder from the dictionary. 5. Then extract the .tgz files under its respective district folder.

Script tool with input parameters Geoprocessing results Record count in
ArcCatalog Step 3: The extracted data on J: drive is checked for any corrupt files.

Step 3: The extracted data on J: drive is checked
for any corrupt files. 1. Import required modules 2. Set input variables using arcpy module 3. Check data by opening files in ArcCatalog and get record count from each file. Using os.walk command, loop through each directory and get a list of existing files. 4. Then open each file and get record count of each related table. Using try and except method, capture any error messages and get a count of corrupt files. 5. Finally, if the count for corrupt files equals to zero, confirm that all files passed QC.

Script tool with input parameters Geoprocessing results Output in ArcCatalog
Step 4: Delete folders with shapefiles and rename geodatabases as its parent folder.

Step 4: Delete directories with shapefiles and rename geodatabases as
its parent folder. 1. Import required modules 2. Set input variables using arcpy module. 3. Using os.walk command, search for directories or folders with names, “Edges”, “Nodes”, or “OD” and using shutil.rmtree command, delete those folders with shapefiles, csv and dbf tables. 4. While still inside the for loop, find parent directory name of data.gdb and rename it with parent directory.

https://docs.python.org/3/library/ftplib.html https://pro.arcgis.com/en/pro- app/arcpy/main/arcgis-pro-arcpy- reference.htm https://gis.stackexchange.com References & my GitHub Repository
Link: https://github.com

Questions? Email: [email protected] Thank you!

Time Savings with Python

Time Savings with Python

ATX GIS Day

More Decks by ATX GIS Day

Other Decks in Technology

Featured

Transcript

Time Savings With Python November 13, 2019 Sesha Sailaja Choutagunta

• Strava is a social fitness network that is primarily

Beating the clock Problem: Every quarter of the year, I

Before: Prior to writing a python program, I had to

View from ArcCatalog

Step 1 Script tool to download Strava quarterly data for

Script tool with input parameters Output in windows explorer Geoprocessing

Step 1: Python script to download data from FTP site

Script tool with input parameters Output in windows explorer Geoprocessing

Step 2: Extract data from I: drive on to J:

Script tool with input parameters Geoprocessing results Record count in

Step 3: The extracted data on J: drive is checked

Script tool with input parameters Geoprocessing results Output in ArcCatalog

Step 4: Delete directories with shapefiles and rename geodatabases as

https://docs.python.org/3/library/ftplib.html https://pro.arcgis.com/en/pro- app/arcpy/main/arcgis-pro-arcpy- reference.htm https://gis.stackexchange.com References & my GitHub Repository

Questions? Email: [email protected] Thank you!