5 Data interpolation

Variables which enter hydrological analysis like temperature or precipitation are measured locally at point locations. Many of these qualities are then recalculated on different domains using interpolation or extrapolation algorithms. We interpolate within the boundary given by the values which are known to us and extrapolate outside this boundary. Here the two common methods for interpolation are demonstrated.

5.1 Inverse distance weighting (IDW)

IDW is a simple deterministic interpolation method, which is also non-parametric. The interpolated points are calculated with a weighted average of the values which are at disposal. The space is a weighted average of the distribution of points and the weight assigned to each point decrease with increasing distance from the interpolated point.

The point value \(Z_p\) is calculated with the knowledge of \(z_i\) and distance \(d_i\) to the power of \(P\).

\[ Z_L = \dfrac{\sum\limits_{i=1}^{n}\dfrac{z_i}{d_i^p}}{\sum\limits_{i=1}^{n}\dfrac{1}{d_i^p}} \] Let’s imagine we have three values \(z_1 = 10\), \(z_2 = 15\), \(z_3 = 20\) and we need to estimate a value in a location \(L\). The distances to the location from these points are: \(d_1 = 3.0\), \(d_2 = 2.0\), \(d_3 = 1.5\). The commonly used power \(p\) is \(2\).
So the respective values of weights for these are:

\[ \begin{array}{c} w_1 = \dfrac{10}{3.0^2}\\ w_2 = \dfrac{15}{2.0^2}\\ w_3 = \dfrac{20}{1.5^2} \end{array} \]

Hence the numerator and denominator look as follows:

\[ \begin{array}{l} N = \dfrac{10}{2.0^2} + \dfrac{15}{2.0^2} + \dfrac{20}{1.5^2}\approx 2.5 + 2.2222 + 6.6666 = 11.3888\\ D = 0.25 + 0.1111 + 0.4444 = 0.8055\\ \end{array} \]

Since this method is very simple, let’s calculate it on our own, in a step-by-step manner.

First we generate some point in the spatial domain, which will represent our measurements.
Next we create a new domain, spatially regular to which we will interpolate.
Finally we will perform the calculations and visualize the results.

5.1.1 Generation of random measurements network

We will create 25 points in the space and also assign some coordinate reference system. These points will be used for both of the methods.

Code

library(sf)          # Spatial Feature library for spatial data
library(scico)       # Scientific palette collection called "scico"

n <- 30
dom <- data.frame(x = runif(n, 0, 50),
                  y = runif(n, 0, 50),
                  value = rnorm(n, mean = 5)) |>
  sf::st_as_sf(coords = c("x", "y"),
               crs = 4326)
plot(dom,
     nbreaks = n + 1,
     pal = scico(n = n,
                 palette = "davos",
                 end = 0.9,
                 direction = -1),
     main = "Precipitation [mm]",
     pch  = 20,          # Point character selection
     graticule = TRUE,   # Display graticules
     axes = TRUE,        # Display plot axes
     bgc = "#f0f0f033",  # Background color
     key.pos = 1)        # Legend position

1: Create \(25\) points with random values, which will serve as the computation origin,
2: store them as SimpleFeatures object with coordinate reference system via EPSG, search in https://epsg.io
3: Specify sf:::plot.sf() function and use scientific colormaps from scico package.

We do have the original points, which are scarcely distributed across the domain. Now we need a grid of new points, which will represent the centroids of a raster, on which we want to recalculate.

Code

grid <- st_make_grid(x = dom,
                     cellsize = 2,
                     crs = 4326) |>
  st_sf()
plot(x = dom,
     breaks = seq(min(dom$value), max(dom$value), length.out = n),
     pal = scico(n = n - 1,
                 palette   = "davos",
                 end       = 0.9,
                 direction = -1),
     main      = "Precipitation [mm]",
     pch       = 20,          # Point character selection
     graticule = TRUE,        # Display graticules
     axes      = TRUE,        # Display plot axes
     bgc       = "#f0f0f033", # Background color
     key.pos   = 1, reset = FALSE)           # Legend position
plot(x = st_geometry(grid), lwd = 0.1, add = TRUE, reset = TRUE)

1: Construct a regular grid using extreme points as boundary limits.
2: Plot with the points.

Point location of Precipitation measuring stations

Now we compute the distances between the points using the outer() function

Code

distances <- sapply(1:nrow(st_coordinates(grid)), function(i) {
  sqrt((st_coordinates(grid)[i, "X"] - st_coordinates(dom)[, "X"])^2 +
       (st_coordinates(grid)[i, "Y"] - st_coordinates(dom)[, "Y"])^2)
})

and use these distances for missing values computation

Code

epsilon <- 1e-6
distances <- distances + epsilon
power <- 2
# Compute weighted sums for all grid points
weighted_sum <- sapply(1:ncol(distances), function(i) {
  sum(dom$value / (distances[, i]^power))
})

# Compute weights sum for all grid points
weights_sum <- sapply(1:ncol(distances), function(i) { 
  sum(1 / (distances[, i]^power))
})

# Calculate interpolated values for all grid points
interpolated_values <- weighted_sum / weights_sum

# Create a spatial data frame for results
results <- data.frame(
  x = st_coordinates(grid)[, "X"],
  y = st_coordinates(grid)[, "Y"],
  value = interpolated_values
) |> 
  sf::st_as_sf(coords = c("x", "y"), crs = 4326)

# Replace NaN values with NA for plotting
results$value[is.nan(results$value)] <- NA

# Join results with the grid for visualization
join <- st_join(grid, results)

1: Set a tolerance for when the distance is equal to 0.
2: A power for the distance.

At last we visualize the results.

Code

plot(x = join, 
     breaks = seq(min(results$value), max(results$value), length.out = n),
     lwd = 0.001,
     pal = scico(n - 1, 
                 palette = "davos",
                 end       = 0.9, 
                 direction = -1),
     main      = "Precipitation [mm]", 
     #pch       = 20,          # Point character selection
     graticule = TRUE,        # Display graticules
     axes      = TRUE,        # Display plot axes
     bgc       = "#f0f0f033", # Background color
     key.pos   = 1, 
     reset = FALSE)           # Legend position

5.2 `gstat`

There are many libraries listed in CRAN geostatistics task view. One of these is called gstat, it was developed and is maintained by Edzer Pebesma, who is also behind the raster and terra packages. The gstat package contains functions no olny for interpolations.

Code

library(gstat)

Warning: package 'gstat' was built under R version 4.4.1

Code

idw_res <- gstat::idw(formula = value ~ 1, 
                      locations = dom, 
                      newdata = grid)

[inverse distance weighted interpolation]

Code

idw_res <- st_as_sf(idw_res)
plot(idw_res |> 
       dplyr::select(var1.pred), 
     breaks = seq(min(idw_res$var1.pred), 
                  max(idw_res$var1.pred), 
                  length.out = n),
     lwd = 0.001,
     pal = scico(n - 1, 
                 palette = "davos",
                 end       = 0.9, 
                 direction = -1),
     main      = "Precipitation [mm]", 
     #pch       = 20,          # Point character selection
     graticule = TRUE,        # Display graticules
     axes      = TRUE,        # Display plot axes
     bgc       = "#f0f0f033", # Background color
     key.pos   = 1, 
     reset = FALSE)

Code

power <- 2  # You can adjust the power parameter
weighted_sum <- apply(distances, 1, function(d) sum(dom$value / (d^power)))
weighted_sum

5.3 Krigging

Another interpolation technique is called Krigging. Opposing to the Inverse Distance Weighted method.

Code

projected_crs <- 32633  # Replace with the appropriate UTM zone for your data

dom_projected <- st_transform(dom, crs = projected_crs)
grid_projected <- st_transform(grid, crs = projected_crs)

set.seed(123)  # For reproducibility
idw_points <- st_centroid(idw_res)

Warning: st_centroid assumes attributes are constant over geometries

Code

sampled_points <- idw_points[sample(1:nrow(idw_points), 50), ]

combined_points <- rbind(
  dom,
  sampled_points |> 
  dplyr::select(value = var1.pred, geometry)
)
combined_points <- st_transform(combined_points, crs = projected_crs)

An essential part of kriging is definition of a variogram, which is a function describing the degree of spatial dependence of a spatial process. Due to the range parameter, the spatial data needs to be projected.

Code

variogram <- vgm(psill = 5.05, model = "Exp", range = 300000, nugget = 0.15)

Code

krig_res <- gstat::krige(
  formula = value ~ 1,
  locations = combined_points,
  newdata = grid_projected,
  model = variogram  # Use the appropriate fitted variogram model
)

[using ordinary kriging]

Code

# Step 3: Reproject the kriged result back to the original CRS (WGS84)
krig_res <- st_transform(st_as_sf(krig_res), crs = 4326)
# Generate unique breaks
min_val <- min(krig_res$var1.pred, na.rm = TRUE)
max_val <- max(krig_res$var1.pred, na.rm = TRUE)

# Ensure that min and max are sufficiently different to create unique breaks
if (max_val - min_val > 0) {
  breaks <- seq(min_val, max_val, length.out = 25)
} else {
  breaks <- unique(c(min_val, max_val))  # Fallback for nearly constant data
}

# Use the scico color palette with the correct number of colors
colors <- scico(length(breaks) - 1, 
                palette = "davos", 
                end = 0.9, 
                direction = -1)

# Plotting using the 'pal' argument for sf objects
plot(
  krig_res |> dplyr::select(var1.pred), 
  breaks = breaks,
  pal = colors,        # Use 'pal' to specify the color palette
  lwd = 0.001,
  main = "Precipitation [mm]", 
  graticule = TRUE, 
  axes = TRUE, 
  bgc = "#f0f0f033", 
  key.pos = 1, 
  reset = FALSE
)

5.1 Inverse distance weighting (IDW)

5.1.1 Generation of random measurements network

5.2 gstat

5.3 Krigging

5.2 `gstat`