• Skip to main content
itrc_logo

Geospatial Analysis for Optimization at Environmental Sites

Navigating this Website
Overview
Fact Sheets
Fact Sheets Overview
Fact Sheet 1: Do You Need Geospatial Analysis?
Fact Sheet 2: Are Conditions Suitable for Geospatial Analysis?
Fact Sheet 3: How is Geospatial Analysis Applied?
Fact Sheet 4: What Software is Available to Help?
PM's Tool Box
PM's Tool Box Overview
Review Checklist
Choosing Methods
Common Misapplications
Optimization Questions
Geospatial Analysis Support for Optimization Questions in the Project Life Cycle
Data Requirements
General Considerations
Methods for Optimization
Geospatial Methods for Optimization Questions in the Project Life Cycle Stages
Release Detection
Site Characterization
Remediation
Monitoring
Closure
Documenting Results
Fundamental Concepts
Fundamental Concepts for Geospatial Analysis
Basic Data Concepts for Geospatial Analysis
Interpolation Methods and Model Prediction
Uncertainty in Geospatial Analyses
Characteristics of Interpolation Methods
Work Flow
Work Flow for Conducting Geospatial Analysis
Geospatial Analysis Work Flow Overview
Perform Exploratory Data Analysis
Select Geospatial Method
Build Geospatial Model
Evaluate Geospatial Method Accuracy
Generate Geospatial Analysis Results
Using Results
Using Analysis Results for Optimization
Plume Intensity and Extent
Trend Maps
Estimating Quantities
Hot Spot Detection
Sample Spacing
Estimating Concentrations Based on Proxy Data
Background Estimation
Quantifying Uncertainty
Remedial Action Optimization
Monitoring Program Optimization
Examples
Examples Overview
Example 1
Example 2
Example 3
Example 4
Methods
Methods Overview
Simple Geospatial Methods
More Complex Geospatial Methods
Advanced Methods
Index of Methods
Software
Software Overview
Software Comparison Tables
Software Descriptions
Workshops and Short Courses
Case Studies
Case Studies Overview
Superfund Site Monitoring Optimization (MAROS)
PAH Contamination in Sediments—Uncertainty Analysis (Isatis)
Optimization of Long-Term Monitoring at Former Nebraska Ordnance Plant (GTS; Summit Envirosolutions)
Optimization of Lead-Contaminated Soil Remediation at a Former Lead Smelter (EVS/MVS)
Extent of Radiological Contamination in Soil at Four Sites near the Fukushima Daiichi Power Plant, Japan (ArcGIS)
Optimization of Groundwater Monitoring at a Research Facility in New Jersey (GWSDAT)
Optimization of Sediment Sampling at a Tidally Influenced Site (ArcGIS)
Stringfellow Superfund Site Monitoring Optimization (MAROS)
Lead Contamination in Soil (ArcGIS)
Stakeholder Perspectives
Additional Information
Project Life Cycle Stages
History of Remedial Process Optimization
Additional Resources
Acronyms
Glossary
Index of Methods
Acknowledgments
Team Contacts
Document Feedback

 

Geospatial Analysis for Optimization at Environmental Sites
HOME

Build Geospatial Model

Once a general geospatial method has been selected, the data and CSM are used to select appropriate values for the parameters that are relevant for that method. The detailed method descriptions include the parameters that must be selected for each method. Generally, an iterative process is used to build a geospatial model. Based on an initial model, geospatial results are generated and mapped, and then the model is evaluated for accuracy and consistency with the CSM. Typically, alternative models with different parameter values are also constructed and the accuracy of the results from different models are compared using cross-validation. Often, many different models have similar cross-validation performance. In general, use the model with the fewest parameters that has good cross-validation performance.

▼Read more

The goal of the model is to reproduce the variation in the data. In general, geospatial models can have three components: trend (large-scale variation), spatial correlation (small-scale variation), and error. The error component includes both measurement error and variation occurring on a scale smaller than the sample spacing. The first choice to make when selecting a model is whether it is necessary to model uncertainty in the results. If the main purpose is mapping, then there may be no need to put the effort into carefully modeling the spatial correlation or error components. A simple method (or any method with default parameters) can be used with adjustments to the method parameters made to produce a set of predictions that are visually appealing and consistent with the CSM.

If it is important to quantify uncertainty in the prediction, then use a more complex or advanced method. More complex methods (and also simple methods) represent all of the pattern in the data using the trend component. This practice works best when there are other explanatory variables available that are correlated with the primary variable of interest. Ideally, the trend component captures all of the important variation so that the residuals from the trend model have no remaining spatial correlation. If there is significant spatial correlation in the trend residuals, an advanced method is needed so that a spatial correlation component can be included in the model.

If an advanced method is used, then the standard approach is to a use a relatively simple model for the trend, based on a low-order polynomial regression on the coordinates. In some cases, a more detailed model of the trend using local or multiple regression may give better results. The trend model should ideally have a physical basis, such as distance to a contamination source. A good trend model is particularly important if the model is used to extrapolate beyond the area of existing data.

The effect of the choice of trend model can be evaluated using cross-validation. The residuals after removing the trend from the data should be approximately stationary, in order to allow the spatial correlation model to be estimated from the data. Stationarity means that the statistical properties (such as spatial correlation) are the same in different locations. This requirement allows data from different locations to be combined to estimate the spatial correlation model that is applicable throughout the area sampled.

In addition to detrending, the data may also need to be transformed to be normally distributed. An assumption of normality is required in order to quantitatively use the estimates of prediction uncertainty. Otherwise, the uncertainty estimates can only be used in a relative sense to determine where the prediction uncertainty is larger and where it is smaller.

After detrending and transformation, the data can be used to model spatial correlation. The spatial correlation component of the model consists of a variogram (or covariogram) model that is fit to the transformed or detrended data. The process for fitting a variogram model to the empirical variogram is as follows:

  • Choose a suitable variogram model, with or without a nugget.
  • Fit the model by eye or with software.
  • Examine the resulting fit visually and using cross-validation.

Another modeling decision is the choice of search neighborhood. The search neighborhood determines how many nearby points will be used for prediction at each location. There are several reasons for limiting the number of points used through a search neighborhood. First, if there are more than several hundred data points, it may be computationally infeasible to use all of the data. Second, due to uncertainties in the variogram model at larger lag distances, a smaller search neighborhood may give more accurate predictions than a larger search neighborhood. Finally, using a search neighborhood makes the assumption of stationarity much easier to meet, since only each local neighborhood needs to be stationary instead of the entire data domain. As with all modeling choices, model cross-validation can assist with determining the best approach.

Example

▼Read more

To illustrate the model building work flow, consider an alternative approach to modeling the Meuse River zinc data. For this example, instead of using nonparametric regression, other methods can be used to achieve similar results. The zinc concentration data are skewed, so the process begins by log transforming the data to make it more normally distributed. The empirical variogram of the log-transformed data is shown in Figure 42. Potential theoretical variogram models are selected based on an examination of the empirical variogram. The spherical and exponential models are reasonable candidates because they have a similar shape to the empirical variogram. These two models were fit to the empirical variogram using software, with the resulting fits shown in Figures 43 and 44.

gro_wf_17

Figure 42. Empirical variogram of log zinc concentrations.

gro_wf_18

Figure 43. Spherical variogram model fit.

gro_wf_19

Figure 44. Exponential variogram model fit.

The EDA conducted in the regression example shows that the zinc concentration is correlated with distance from the river. There is a linear relationship between log zinc concentration and the square root of distance from the river. An alternative trend model can next be developed based on a regression on the square root of distance. After fitting this trend model, a variogram of the residuals is fit using a spherical model as shown in Figure 45.

 

gro_wf_20

Figure 45. Spherical variogram fit to detrended data.

Visually, these variogram models all appear to fit the data well. What if the predictions from these variogram models were based on kriging instead? Or an interpolation using IDW? Figure 46 shows the predictions for log zinc for the four approaches: (1) IDW; (2) ordinary kriging (OK) with the spherical variogram; (3) OK with the exponential variogram; and (4) kriging with external drift (KED), kriging with external trend, with the spherical variogram. Ordinary kriging uses a constant for the trend component (no trend), while kriging with external drift is using the regression based on distance to the river for the trend component. The kriging predictions resemble each other and are much smoother than IDW. In the next section, cross-validation is used to quantitatively evaluate the quality of these fits.

gro_wf_21

Figure 46. Predictions from four models.

image_pdfPrint this page/section



GRO

web document
glossaryGRO Glossary
referencesGRO References
acronymsGRO Acronyms
ITRC
Contact Us
About ITRC
Visit ITRC
social media iconsClick here to visit ITRC on FacebookClick here to visit ITRC on TwitterClick here to visit ITRC on LinkedInITRC on Social Media
about_itrc
Permission is granted to refer to or quote from this publication with the customary acknowledgment of the source (see suggested citation and disclaimer). This web site is owned by ITRC • 1250 H Street, NW • Suite 850 • Washington, DC 20005 • (202) 266-4933 • Email: [email protected] • Terms of Service, Privacy Policy, and Usage Policy ITRC is sponsored by the Environmental Council of the States.