python geoprocessing for rangeland tools and processes
TRANSCRIPT
Python Geoprocessing for Rangeland Management: Guidance for Developing Efficient Tools and Processes
David Howes, Ph.D.
David Howes, LLC
dhowes.com
Eric Sant
Open Range Consulting
openrangeconsulting.com
Northwest GIS 2019November 6th, 2019
David Howes, Ph.D.
• Education• B.Sc. (Hons) in Geography – University of Salford, England• M.Sc. in Geographic Information Systems – University of Edinburgh,
Scotland• Ph.D. in Geomorphology – State University of New York at Buffalo,
New York
• 28 years in GIS
• Specialty: GIS tools, processes and supporting infrastructure
• Established• David Howes, LLC in 2012• GISPD.com in 2014
Eric Sant, M.S.
• Education
• B.S. in Geography – Utah State University
• M.S. in Geography – Utah State University
• Specialty
Improving grazing management by giving land managers geospatial assessment products that are accurate, timely, and cost efficient
• Experience
Worked on wide variety of federal, state, and private industry projects concerned with assessing the biological value of large landscapes
Acknowledgement
Tim Bateman - Geospatial Analyst, Open Range Consulting
Open Range Consulting
• http://openrangeconsulting.com/
• “Open Range Consulting wants to improve the world through rangeland management by providing statistically valid, economically feasible and expedient landscape information”
Presentation Approach
• Requirement
• Solution
• Considerations
Requirement
Tasks
• Run the ArcGIS Spatial Analyst Zonal Statistics as Table tool for a set of zones and a set of rasters derived from a composite (four-band) raster (from aerial imagery)
• Use the zonal statistics data to develop a model using the statistical package R
• Apply the model to rasters derived from the source raster to create an output raster
Original Solution
Developed Python scripts to apply to full source raster
Reality
• Source rasters keep getting bigger as resolution and required coverage increase
• Processes take far too long
• Soon run out of RAM
Options
• Get a bigger machine
• Enhance the procedures
Solution
Both
• New computer
• 28 cores (56 processors)
• 128GB RAM
• New Python tools
Split Source Raster into Parts and Process Simultaneously
Append Output Parts
Return Full Output Rasters
Typical Data Quantities
Item Size (GB) Count Total Size (GB)
Composite raster1-m NAIP (National Agriculture Imagery Program).img format172,854 x 148,946 pixels
255.00 1 255.00
Composite raster band 63.75 4 255.00
Derived raster 63.75 19 1,211.25
Output raster 63.75 1 63.75
TOTAL 1,785.00
Conceptual Steps
• Prepare source and derived part rasters
• Generate zonal statistics data for training zones using source part rasters
• Prepare an R model using the zonal statistics data
• Run the R model for each part
• Join part output rasters together to create full final raster
Implementation
• Develop a set of Python scripts
• Run at the command line
• Use the same simple input file for all scripts
• Standardize the code and syntax
• Develop a clean data structure
Data Folder/File Structure
Level 1 Folder Level 2 Folder Level 3 Folder Contents
Source\ Source raster
Parts\ Part_001 Source part raster
Part_002etc.
Processing\ Part_001\ In\ Source band rastersDerived rasters
Out\ Part output raster
Part_002\etc.
Out\ Full output raster
Scripts (1)
• prepare_parts.py/create_part_rasters.py• Extract source parts and create derived part rasters using multiprocessing
• Create part extents feature class
• zone_raster_relationships.pyIdentify zones per source part raster and store details in Zone-Raster Relationships table
• multi_raster_zonal_statistics.py• Read records in Zone-Raster Relationships table
• Runs Spatial Analyst Zonal Statistics as Table tool for each part using multiprocessing
• Compiles data into single output table
Scripts (2)
• run_r_model_parallel.py• Run R model for each part using multiprocessing
• Input: derived part rasters
• Output: single part output raster
• create_raster_from_parts.py
Mosaic part output rasters to create final full output raster
Script: multi_raster_zonal_statistics.py
• Using multiprocessing for each part• Create temporary folder
• Create temporary file geodatabase
• Create temporary zones feature class
• Create and run temporary Python script• Runs Zonal Statistics as Table for part zones and source part raster
• Creates temporary output table
• Compile data from temporary zonal statistics table into single table
• Remove temporary data
Basic Multiprocessing
# Imports from multiprocessing import Process
import subprocess
# Function to run each process def run_shell(command):
p = subprocess.Popen(command)
p.communicate()
def main():
for each part:
# Create process command = "python process_part.py " + args_str
task = Process(target=run_shell, args=(command,))
task.start()
tasks.append(task)
# Wait for all processes to finish for task in tasks:
task.join()
Issues - Empty Parts
Issues - Spatial Analyst Licensing
• ArcGIS Desktop Python 2.7
Spatial Analyst license needs to be checked out for each task process
• ArcGIS Pro Python 3.6
No licensing restrictions on simultaneous use of Spatial Analyst tools
Issues - Multiprocessing Within ArcGIS Programs
Can't run multiprocessing within ArcGIS desktop programs
• Desktop (ArcMap, ArcCatalog)
Program hangs
• Pro
New Pro instance is opened every time a task command is issued
Hence, command line processing
Issues - ArcGIS Pro Python Licensing Error
• Can't run more than 34 processes at once
• For each additional process, error occurs
“RuntimeError: Not signed into Portal”
• Esri solution about to be tested
Considerations
Professional Development
• Want code to be as understandable as possible for client
• Provides a resource for learning and other coding needs
• Don't want to provide a black box
Clarity Triggers
• Suffering from confusion
• Can't remember what you did or how your process works
• Procedures are difficult to explain
If any of these apply, make processes clearer and simpler
Overcoming Lack of Clarity
• Refactor - split processes into basic components
• Simplify - refine and rearrange code
• Generalize - make code reusable wherever possible
Responding to Evolving Requirements
• Can't know all future requirements, but can make growth easier with clean and careful coding
• Try to think ahead and be prepared
• Sometimes need to step back, reevaluate, and refine approaches
• Iterative process as experience is gained
• Apply new ideas
• Address new requirements
Eliminate Barriers to Progress
• Always be ready to
• Reuse code
• Explain it
• Defend it
• You don't want to get stuck using and explaining your own procedures
Standards
• Use simple, consistent, and descriptive coding style and vocabulary
• Follow PEP-8 - Style Guide for Python Code
• https://www.python.org/dev/peps/pep-0008/
• PEP = Python Enhancement Proposal
• Use code inspector (e.g., PyCharm Inspect Code function)
• Implement clean, simple data structures
Wider Applicability
• Concept can be applied to any geoprocessing operation for which tasks can be separated into independent parts
• Processing requirements continually increasing
• Multiprocessing will become more important over time
Takeaway Messages
• Be innovative
• Be ready to evolve and adapt
• Apply helpful standards for all aspects of your processing requirements