used to track cycling and running exercises using GPS data. It’s a crowd-sourced data collected by bicyclists and runners using a smartphone app. • TxDOT-PTN acquired statewide Strava data in coordination with TxDOT-PTN’s bike/ped counting research project
am tasked to download new Bike and Pedestrian data from STRAVA, extract, and perform quality check. Later, delete redundant data in the form of shapefiles and related tables retaining geodatabases and rename them with its respective parent folder. This process may or may not sound complicated but definitely time consuming! Is there a way to save time and automate this process? Solution: Yes! Python is the answer. We can expedite this process and minimize the amount of manual interaction with python program.
go through the following steps manually: 1. Download data from Strava’s FTP site to TxDOT network drive (I:), 2. Extract data from I: drive to another J: drive so, data on the I: drive is kept as a backup of original downloaded data. 3. The extracted data on J: drive is then checked for any corrupt files. 4. The data comes in two different formats, shapefiles and file geodatabases. Since the data is same in both the formats, and all users in our organization have access to ArcGIS software, we need to retain geodatabases with feature classes and tables and delete shapefiles and related csv and dbf files to save space on the network. And finally, rename the geodatabases with its respective parent folder. Now, it definitely sounded complicated and still time consuming… Also, need constant manual interaction and monitoring.
TxDOT 25 Districts and Statewide on to network, I: drive. Step 2 Extract data from I: drive to J: drive. Folder names with district abbreviation on I: drive should be extracted with district full name folders on J: drive. Step 3 Step 4 The extracted data on J: drive is then checked for any corrupt files. Delete directory with shapefiles to save space on the network. Then, rename geodatabases with its respective parent folder. After: Manage time wisely and eliminate human errors by working on these steps through a series of python script tools.
1. Import required modules. 2. Set input variables… using arcpy module. 3. Create a function to download a single file from FTP site. 4. Access FTP Host server using ftplib module. 5. Get a list of all district folders and files from the source FTP site and create the same folder structure at the destination location. 6. Loop through the list of source folders/files and download each file by calling the function that we created to download a single file.
drive 1. Import required modules. 2. Set input variables using arcpy module. 3. Create a dictionary with key as source district abbreviation and set value as destination district full name. 4. Using os.walk command, search through source district folders for any .tgz files, get the parent folder name and create corresponding destination district folder from the dictionary. 5. Then extract the .tgz files under its respective district folder.
for any corrupt files. 1. Import required modules 2. Set input variables using arcpy module 3. Check data by opening files in ArcCatalog and get record count from each file. Using os.walk command, loop through each directory and get a list of existing files. 4. Then open each file and get record count of each related table. Using try and except method, capture any error messages and get a count of corrupt files. 5. Finally, if the count for corrupt files equals to zero, confirm that all files passed QC.
its parent folder. 1. Import required modules 2. Set input variables using arcpy module. 3. Using os.walk command, search for directories or folders with names, “Edges”, “Nodes”, or “OD” and using shutil.rmtree command, delete those folders with shapefiles, csv and dbf tables. 4. While still inside the for loop, find parent directory name of data.gdb and rename it with parent directory.