Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time Savings with Python

ATX GIS Day
November 13, 2019

Time Savings with Python

Sesha Choutagunta, Graphic Design Consultant, NTT Data

ATX GIS Day

November 13, 2019
Tweet

More Decks by ATX GIS Day

Other Decks in Technology

Transcript

  1. Time Savings With
    Python
    November 13, 2019
    Sesha Sailaja Choutagunta
    Graphic Design Consultant, NTT Data

    View Slide

  2. • Strava is a social fitness network
    that is primarily used to track
    cycling and running exercises using
    GPS data. It’s a crowd-sourced data
    collected by bicyclists and runners
    using a smartphone app.
    • TxDOT-PTN acquired statewide
    Strava data in coordination with
    TxDOT-PTN’s bike/ped counting
    research project

    View Slide

  3. Beating the clock
    Problem:
    Every quarter of the year, I am tasked to download new Bike and
    Pedestrian data from STRAVA, extract, and perform quality check. Later,
    delete redundant data in the form of shapefiles and related tables retaining
    geodatabases and rename them with its respective parent folder.
    This process may or may not sound complicated but definitely time
    consuming!
    Is there a way to save time and automate this process?
    Solution:
    Yes! Python is the answer. We can expedite this process and minimize the
    amount of manual interaction with python program.

    View Slide

  4. Before:
    Prior to writing a python program, I had to go through the following
    steps manually:
    1. Download data from Strava’s FTP site to TxDOT network drive (I:),
    2. Extract data from I: drive to another J: drive so, data on the I: drive is kept as a
    backup of original downloaded data.
    3. The extracted data on J: drive is then checked for any corrupt files.
    4. The data comes in two different formats, shapefiles and file geodatabases. Since
    the data is same in both the formats, and all users in our organization have
    access to ArcGIS software, we need to retain geodatabases with feature classes
    and tables and delete shapefiles and related csv and dbf files to save space on
    the network. And finally, rename the geodatabases with its respective parent
    folder.
    Now, it definitely sounded complicated and still time consuming…
    Also, need constant manual interaction and monitoring.

    View Slide

  5. View from ArcCatalog

    View Slide

  6. Step 1
    Script tool to download
    Strava quarterly data for
    TxDOT 25 Districts and
    Statewide on to network,
    I: drive.
    Step 2
    Extract data from I:
    drive to J: drive. Folder
    names with district
    abbreviation on I: drive
    should be extracted
    with district full name
    folders on J: drive.
    Step 3 Step 4
    The extracted data
    on J: drive is then
    checked for any
    corrupt files.
    Delete directory with
    shapefiles to save
    space on the network.
    Then, rename
    geodatabases with its
    respective parent
    folder.
    After: Manage time wisely and eliminate human errors by working on these
    steps through a series of python script tools.

    View Slide

  7. Script tool with input parameters
    Output in windows explorer
    Geoprocessing results
    Step 1: Python script to download data from FTP site

    View Slide

  8. Step 1: Python script to download data from FTP site
    1. Import required modules.
    2. Set input variables… using arcpy module.
    3. Create a function to download a single file from FTP site.
    4. Access FTP Host server using ftplib module.
    5. Get a list of all district folders and files from the
    source FTP site and create the same folder structure at
    the destination location.
    6. Loop through the list of source folders/files and
    download each file by calling the function that we
    created to download a single file.

    View Slide

  9. Script tool with input parameters
    Output in windows explorer
    Geoprocessing results
    Step 2: Extract data from I: drive on to J: drive

    View Slide

  10. Step 2: Extract data from I: drive on to J: drive
    1. Import required modules.
    2. Set input variables using arcpy module.
    3. Create a dictionary with key as source district
    abbreviation and set value as destination district
    full name.
    4. Using os.walk command, search through source district
    folders for any .tgz files, get the parent folder name and create
    corresponding destination district folder from the dictionary.
    5. Then extract the .tgz files under its respective district
    folder.

    View Slide

  11. Script tool with input parameters
    Geoprocessing results
    Record count in ArcCatalog
    Step 3: The extracted data on J: drive is checked for any corrupt files.

    View Slide

  12. Step 3: The extracted data on J: drive is checked for any corrupt files.
    1. Import required modules
    2. Set input variables using arcpy module
    3. Check data by opening files in ArcCatalog and
    get record count from each file. Using os.walk
    command, loop through each directory and get a
    list of existing files.
    4. Then open each file and get record count of each
    related table. Using try and except method, capture any
    error messages and get a count of corrupt files.
    5. Finally, if the count for corrupt files equals to
    zero, confirm that all files passed QC.

    View Slide

  13. Script tool with input parameters
    Geoprocessing results
    Output in ArcCatalog
    Step 4: Delete folders with shapefiles and rename geodatabases as its parent folder.

    View Slide

  14. Step 4: Delete directories with shapefiles and rename geodatabases as its
    parent folder.
    1. Import required modules
    2. Set input variables using arcpy module.
    3. Using os.walk command, search for directories or
    folders with names, “Edges”, “Nodes”, or “OD” and using
    shutil.rmtree command, delete those folders with
    shapefiles, csv and dbf tables.
    4. While still inside the for loop, find parent directory
    name of data.gdb and rename it with parent directory.

    View Slide

  15. https://docs.python.org/3/library/ftplib.html
    https://pro.arcgis.com/en/pro-
    app/arcpy/main/arcgis-pro-arcpy-
    reference.htm
    https://gis.stackexchange.com
    References & my GitHub Repository Link:
    https://github.com

    View Slide

  16. Questions?
    Email: [email protected]
    Thank you!

    View Slide