Slide 1

Slide 1 text

Time Savings With Python November 13, 2019 Sesha Sailaja Choutagunta Graphic Design Consultant, NTT Data

Slide 2

Slide 2 text

• Strava is a social fitness network that is primarily used to track cycling and running exercises using GPS data. It’s a crowd-sourced data collected by bicyclists and runners using a smartphone app. • TxDOT-PTN acquired statewide Strava data in coordination with TxDOT-PTN’s bike/ped counting research project

Slide 3

Slide 3 text

Beating the clock Problem: Every quarter of the year, I am tasked to download new Bike and Pedestrian data from STRAVA, extract, and perform quality check. Later, delete redundant data in the form of shapefiles and related tables retaining geodatabases and rename them with its respective parent folder. This process may or may not sound complicated but definitely time consuming! Is there a way to save time and automate this process? Solution: Yes! Python is the answer. We can expedite this process and minimize the amount of manual interaction with python program.

Slide 4

Slide 4 text

Before: Prior to writing a python program, I had to go through the following steps manually: 1. Download data from Strava’s FTP site to TxDOT network drive (I:), 2. Extract data from I: drive to another J: drive so, data on the I: drive is kept as a backup of original downloaded data. 3. The extracted data on J: drive is then checked for any corrupt files. 4. The data comes in two different formats, shapefiles and file geodatabases. Since the data is same in both the formats, and all users in our organization have access to ArcGIS software, we need to retain geodatabases with feature classes and tables and delete shapefiles and related csv and dbf files to save space on the network. And finally, rename the geodatabases with its respective parent folder. Now, it definitely sounded complicated and still time consuming… Also, need constant manual interaction and monitoring.

Slide 5

Slide 5 text

View from ArcCatalog

Slide 6

Slide 6 text

Step 1 Script tool to download Strava quarterly data for TxDOT 25 Districts and Statewide on to network, I: drive. Step 2 Extract data from I: drive to J: drive. Folder names with district abbreviation on I: drive should be extracted with district full name folders on J: drive. Step 3 Step 4 The extracted data on J: drive is then checked for any corrupt files. Delete directory with shapefiles to save space on the network. Then, rename geodatabases with its respective parent folder. After: Manage time wisely and eliminate human errors by working on these steps through a series of python script tools.

Slide 7

Slide 7 text

Script tool with input parameters Output in windows explorer Geoprocessing results Step 1: Python script to download data from FTP site

Slide 8

Slide 8 text

Step 1: Python script to download data from FTP site 1. Import required modules. 2. Set input variables… using arcpy module. 3. Create a function to download a single file from FTP site. 4. Access FTP Host server using ftplib module. 5. Get a list of all district folders and files from the source FTP site and create the same folder structure at the destination location. 6. Loop through the list of source folders/files and download each file by calling the function that we created to download a single file.

Slide 9

Slide 9 text

Script tool with input parameters Output in windows explorer Geoprocessing results Step 2: Extract data from I: drive on to J: drive

Slide 10

Slide 10 text

Step 2: Extract data from I: drive on to J: drive 1. Import required modules. 2. Set input variables using arcpy module. 3. Create a dictionary with key as source district abbreviation and set value as destination district full name. 4. Using os.walk command, search through source district folders for any .tgz files, get the parent folder name and create corresponding destination district folder from the dictionary. 5. Then extract the .tgz files under its respective district folder.

Slide 11

Slide 11 text

Script tool with input parameters Geoprocessing results Record count in ArcCatalog Step 3: The extracted data on J: drive is checked for any corrupt files.

Slide 12

Slide 12 text

Step 3: The extracted data on J: drive is checked for any corrupt files. 1. Import required modules 2. Set input variables using arcpy module 3. Check data by opening files in ArcCatalog and get record count from each file. Using os.walk command, loop through each directory and get a list of existing files. 4. Then open each file and get record count of each related table. Using try and except method, capture any error messages and get a count of corrupt files. 5. Finally, if the count for corrupt files equals to zero, confirm that all files passed QC.

Slide 13

Slide 13 text

Script tool with input parameters Geoprocessing results Output in ArcCatalog Step 4: Delete folders with shapefiles and rename geodatabases as its parent folder.

Slide 14

Slide 14 text

Step 4: Delete directories with shapefiles and rename geodatabases as its parent folder. 1. Import required modules 2. Set input variables using arcpy module. 3. Using os.walk command, search for directories or folders with names, “Edges”, “Nodes”, or “OD” and using shutil.rmtree command, delete those folders with shapefiles, csv and dbf tables. 4. While still inside the for loop, find parent directory name of data.gdb and rename it with parent directory.

Slide 15

Slide 15 text

https://docs.python.org/3/library/ftplib.html https://pro.arcgis.com/en/pro- app/arcpy/main/arcgis-pro-arcpy- reference.htm https://gis.stackexchange.com References & my GitHub Repository Link: https://github.com

Slide 16

Slide 16 text

Questions? Email: [email protected] Thank you!