| Title: | Coarse-to-Fine Spatial Modeling |
|---|---|
| Description: | Provides functions for coarse-to-fine spatial modeling (CFSM), enabling fast spatial prediction, regression, and uncertainty quantification. This method is suitable for moderate to large samples. For further details, see Murakami et al. (2026) <doi:10.1111/gean.70034>. |
| Authors: | Daisuke Murakami [aut, cre], Alexis Comber [aut], Takahiro Yoshida [aut], Narumasa Tsutsumida [aut], Chris Brunsdon [aut], Tomoki Nakaya [aut] |
| Maintainer: | Daisuke Murakami <[email protected]> |
| License: | GPL (>=2) |
| Version: | 0.1.1 |
| Built: | 2026-05-29 19:36:14 UTC |
| Source: | https://github.com/dmuraka/spcf |
Prediction and regression via CF-GLMMs.
cf_glm( y, x = NULL, coords, offset = NULL, x0 = NULL, coords0 = NULL, offset0 = NULL, mod_hv )cf_glm( y, x = NULL, coords, offset = NULL, x0 = NULL, coords0 = NULL, offset0 = NULL, mod_hv )
y |
Vector of response variables (N x 1) including continuous, count, and binary responses, following an exponential family distribution. |
x |
Matrix of covariates (N x K). |
coords |
Matrix of 2-dimensional point coordinates (N x 2). |
offset |
Optional. Vector of offset variable (N x 1) to be included
in the linear predictor. It is consistent with that of |
x0 |
Optional. Matrix of covariates at prediction sites (N0 x K). |
coords0 |
Optional. Matrix of 2-dimensional point coordinates at prediction sites (N0 x 2). |
offset0 |
Optional. Vector of offset variables at prediction sites (N0 x 1) |
mod_hv |
Output object of the |
A list with the following elements:
Regression coefficients, their standard errors, and the lower and upper limits of the 95 percent confidence intervals.
Standard deviation of the regression term (xb), spatial process (spatial_scale1, spatial_scale2,...), additional learning, and residuals.
Error statistics for the validation samples: pseudo R-squared, root mean squared error (RMSE), and mean absolute error (MAE).
Predictive means and standard deviations (sample sites).
Predictive means and standard deviations (prediction sites).
Predictive quantiles on the response scale at the sample
sites. A data frame whose columns q0.005, q0.025,
q0.05, q0.1, ..., q0.9, q0.95, q0.975,
q0.995 give the corresponding quantile levels, obtained by
Gaussian approximation on the link scale followed by inverse-link
transformation.
Predictive quantiles on the response scale at the
prediction sites. Column structure is identical to pred_q.
NULL when prediction sites are not supplied.
Bandwidth values for each scale. The i-th bandwidth is used for the spatial process corresponding to the i-th column of the Z matrix.
Predictive mean of the spatial process in each scale (sample sites; list).
Predictive standard deviation of the spatial process in each scale (sample sites; list).
Predictive mean of the spatial process in each scale (prediction sites; list).
Predictive standard deviation of the spatial process in each scale (prediction sites; list).
Other internal output objects.
Daisuke Murakami
Murakami, D., Comber, A., Yoshida, T., Tsutsumida, N., Brunsdon, C., & Nakaya, T. (2025). Coarse-to-fine spatial GLMMs for scalable prediction and multiscale analysis. *ArXiv*.
################ Example 1: Count data modeling/Disease mapping/smoothing set.seed(1234) require( CARBayesdata ) require( sf ) data(pollutionhealthdata) data(GGHB.IZ) ### Data dat <- pollutionhealthdata[pollutionhealthdata$year==2011,] y <- dat[,"observed"] # count data x <- dat[,c("pm10","jsa","price")] offset <- log(dat[,"expected"]) coords <- st_coordinates(st_centroid(GGHB.IZ)) ### Holdout validation optimizing the number of spatial scales mod_hv <- cf_glm_hv(y = y, x = x, offset=offset, coords = coords, family=poisson()) ### Spatial modeling and prediction mod <- cf_glm(y = y, x = x, coords = coords, mod_hv = mod_hv) mod ### Mapping predictive mean and standard deviations (SD) GGHB.IZ$y <- y GGHB.IZ$pred <- mod$pred$pred GGHB.IZ$pred_sd<- mod$pred$pred_sd plot(GGHB.IZ[,c("pred")],lwd=0.2,axes=TRUE, key.pos=4,nbreaks=50) # Predictive mean plot(GGHB.IZ[,c("pred_sd")],lwd=0.2,axes=TRUE, key.pos=4,nbreaks=50)# Predictive SD ### Multiscale spatial pattern/feature extraction mod_s1 <- sp_scalewise(mod,bw_range=c(4000,Inf)) # Large scale (4000 <= bandwidth) mod_s2 <- sp_scalewise(mod,bw_range=c(0,4000)) # Small scale (bandwidth <= 4000) GGHB.IZ$z1 <- mod_s1$pred$pred GGHB.IZ$z2 <- mod_s2$pred$pred plot(GGHB.IZ[,c("z1","z2")],lwd=0.2,axes=TRUE,key.pos=4, nbreaks=50)# Extracted features ################ Example 2: Binary data modeling/spatial prediction set.seed(1234) require(sp); require(sf) data(meuse) data(meuse.grid) ### Data y <- ifelse(meuse$ffreq==1, 1, 0 )# binary data coords <- meuse[,c("x","y")] x <- meuse[,"dist"] ### Data at prediction sites coords0 <- meuse.grid[,c("x","y")] x0 <- meuse.grid[,"dist"] ### Holdout validation optimizing the number of spatial scales mod_hv <- cf_glm_hv(y = y, x = x, coords = coords, family=binomial()) ### Spatial modeling and prediction mod <- cf_glm(y = y, x=x, coords = coords, x0=x0, coords0 = coords0, mod_hv = mod_hv) mod ### Mapping predictive mean and standard deviations (SD) meuse.grid$pred <- mod$pred0$pred meuse.grid$pred_sd<- mod$pred0$pred_sd meuse.grid_sf <- st_as_sf(meuse.grid, coords = c("x","y")) plot(meuse.grid_sf[,"pred"], pch = 15, cex = 0.8, nbreaks = 20) # Predictive mean plot(meuse.grid_sf[,"pred_sd"], pch = 15, cex = 0.8, nbreaks = 20)# Predictive SD ### Multiscale spatial pattern/feature extraction mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth) mod_s2<- sp_scalewise(mod,bw_range=c(0,1000)) # Small scale (0 <= bandwidth <= 1000) meuse.grid_sf$z1 <- mod_s1$pred0$pred meuse.grid_sf$z2 <- mod_s2$pred0$pred plot(meuse.grid_sf[,c("z1","z2")], pch = 15, cex = 0.5, nbreaks = 20,axes=TRUE) # Predictive means################ Example 1: Count data modeling/Disease mapping/smoothing set.seed(1234) require( CARBayesdata ) require( sf ) data(pollutionhealthdata) data(GGHB.IZ) ### Data dat <- pollutionhealthdata[pollutionhealthdata$year==2011,] y <- dat[,"observed"] # count data x <- dat[,c("pm10","jsa","price")] offset <- log(dat[,"expected"]) coords <- st_coordinates(st_centroid(GGHB.IZ)) ### Holdout validation optimizing the number of spatial scales mod_hv <- cf_glm_hv(y = y, x = x, offset=offset, coords = coords, family=poisson()) ### Spatial modeling and prediction mod <- cf_glm(y = y, x = x, coords = coords, mod_hv = mod_hv) mod ### Mapping predictive mean and standard deviations (SD) GGHB.IZ$y <- y GGHB.IZ$pred <- mod$pred$pred GGHB.IZ$pred_sd<- mod$pred$pred_sd plot(GGHB.IZ[,c("pred")],lwd=0.2,axes=TRUE, key.pos=4,nbreaks=50) # Predictive mean plot(GGHB.IZ[,c("pred_sd")],lwd=0.2,axes=TRUE, key.pos=4,nbreaks=50)# Predictive SD ### Multiscale spatial pattern/feature extraction mod_s1 <- sp_scalewise(mod,bw_range=c(4000,Inf)) # Large scale (4000 <= bandwidth) mod_s2 <- sp_scalewise(mod,bw_range=c(0,4000)) # Small scale (bandwidth <= 4000) GGHB.IZ$z1 <- mod_s1$pred$pred GGHB.IZ$z2 <- mod_s2$pred$pred plot(GGHB.IZ[,c("z1","z2")],lwd=0.2,axes=TRUE,key.pos=4, nbreaks=50)# Extracted features ################ Example 2: Binary data modeling/spatial prediction set.seed(1234) require(sp); require(sf) data(meuse) data(meuse.grid) ### Data y <- ifelse(meuse$ffreq==1, 1, 0 )# binary data coords <- meuse[,c("x","y")] x <- meuse[,"dist"] ### Data at prediction sites coords0 <- meuse.grid[,c("x","y")] x0 <- meuse.grid[,"dist"] ### Holdout validation optimizing the number of spatial scales mod_hv <- cf_glm_hv(y = y, x = x, coords = coords, family=binomial()) ### Spatial modeling and prediction mod <- cf_glm(y = y, x=x, coords = coords, x0=x0, coords0 = coords0, mod_hv = mod_hv) mod ### Mapping predictive mean and standard deviations (SD) meuse.grid$pred <- mod$pred0$pred meuse.grid$pred_sd<- mod$pred0$pred_sd meuse.grid_sf <- st_as_sf(meuse.grid, coords = c("x","y")) plot(meuse.grid_sf[,"pred"], pch = 15, cex = 0.8, nbreaks = 20) # Predictive mean plot(meuse.grid_sf[,"pred_sd"], pch = 15, cex = 0.8, nbreaks = 20)# Predictive SD ### Multiscale spatial pattern/feature extraction mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth) mod_s2<- sp_scalewise(mod,bw_range=c(0,1000)) # Small scale (0 <= bandwidth <= 1000) meuse.grid_sf$z1 <- mod_s1$pred0$pred meuse.grid_sf$z2 <- mod_s2$pred0$pred plot(meuse.grid_sf[,c("z1","z2")], pch = 15, cex = 0.5, nbreaks = 20,axes=TRUE) # Predictive means
Trains a coarse-to-fine spatial GLMMs (CF-GLMMs) and optimizes the spatial scale through progressive holdout validation.
cf_glm_hv( y, x = NULL, coords, offset = NULL, train_rat = 0.75, id_train = NULL, alpha = 0.9, kernel = "exp", family = gaussian(), seed = 1234 )cf_glm_hv( y, x = NULL, coords, offset = NULL, train_rat = 0.75, id_train = NULL, alpha = 0.9, kernel = "exp", family = gaussian(), seed = 1234 )
y |
Vector of response variables (N x 1) including continuous, count, and binary responses, following an exponential family distribution. |
x |
Matrix of covariates (N x K). |
coords |
Matrix of 2-dimensional point coordinates (N x 2). |
offset |
Optional. Vector of offset variable (N x 1) to be included
in the linear predictor. It is consistent with that of |
train_rat |
Training sample ratio (default: 0.75). For small to moderate samples (N <= 30000), samples closest to the k-means centers are used for validation samples. For larger samples, training samples are drawn at random. |
id_train |
Optional. If specified, the corresponding samples are used as training samples. Otherwise, training samples are chosen based on 'train_rat'. |
alpha |
Decay ratio of the kernel bandwidth in the coarse-to-fine training (default: 0.9). As it approaches one, the optimization becomes more stringent but requires longer computation time. |
kernel |
Kernel type for modeling spatial dependence. '"exp"' for the exponential kernel (default) and '"gau"' for the Gaussian kernel. |
family |
Description of the error distribution and link function
consistent with the 'family' argument in the |
seed |
Random seed used for the training/validation split when 'id_train' is not supplied. Defaults to '1234', which makes the split reproducible across calls. Set to 'NULL' to allow each call to draw a different split (useful for assessing sensitivity to the split). |
A list with the following elements:
Deviance loss for validation samples.
All the deviance losses obtained in each learning step.
ID of training samples.
List of other outcomes, which are internally used.
Daisuke Murakami
Murakami, D., Comber, A., Yoshida, T., Tsutsumida, N., Brunsdon, C., & Nakaya, T. (2025). Coarse-to-fine spatial GLMMs for scalable prediction and multiscale analysis. *ArXiv*.
Prediction and regression via coarse-to-fine spatial modeling.
cf_lm(y, x = NULL, coords, x0 = NULL, coords0 = NULL, mod_hv)cf_lm(y, x = NULL, coords, x0 = NULL, coords0 = NULL, mod_hv)
y |
Vector of response variables (N x 1). |
x |
Matrix of covariates (N x K). |
coords |
Matrix of 2-dimensional point coordinates (N x 2). |
x0 |
Optional. Matrix of covariates at prediction sites (N0 x K). |
coords0 |
Optional. Matrix of 2-dimensional point coordinates at prediction sites (N0 x 2). |
mod_hv |
Output object of the |
A list with the following elements:
Regression coefficients, their standard errors, and the lower and upper limits of the 95 percent confidence intervals.
Standard deviation of the regression term (xb), spatial process (spatial_scale1, spatial_scale2,...), additionally learned components (effective if 'cf_lm_hv/add_learn' is not 'none'), and residuals.
R-squared for the validation samples (validation_R2), root mean squared error for the validation samples (validation_RMSE), and the residual standard deviation (residual_SD).
Predictive means and standard deviations (sample sites).
Predictive means and standard deviations (prediction sites).
Bandwidth values for each scale. The i-th bandwidth is used to describe the spatial process corresponding to the i-th column of the Z matrix.
Predictive means of the single-scale processes at each scale, corresponding to each bandwidth value (sample sites; list).
Predictive standard deviation of the spatial processes corresponding to in each bandwidth (sample sites; list).
Predictive mean of the spatial process corresponding to each bandwidth (prediction sites; list).
Predictive standard deviation of the spatial process corresponding to in each bandwidth (prediction sites; list).
Other internal output objects.
Daisuke Murakami
Murakami, D., Comber, A., Yoshida, T., Tsutsumida, N., Brunsdon, C., & Nakaya, T. (2026). Coarse-to-fine spatial modeling: A scalable, machine-learning-compatible framework. *Geographical Analysis*, 58(2), e70034. https://onlinelibrary.wiley.com/doi/10.1111/gean.70034
cf_glm, cf_lm_hv, sp_scalewise
set.seed(123) require(sp); require(sf) data(meuse) data(meuse.grid) ### Data y <- log(meuse[,"zinc"]) coords <- meuse[,c("x","y")] x <- data.frame(dist = meuse[,"dist"], ffreq2 = as.integer(meuse$ffreq == 2), ffreq3 = as.integer(meuse$ffreq == 3)) ### Data at prediction sites coords0 <- meuse.grid[,c("x","y")] x0 <- data.frame(dist = meuse.grid[,"dist"], ffreq2 = as.integer(meuse.grid$ffreq == 2), ffreq3 = as.integer(meuse.grid$ffreq == 3)) ### Holdout validation optimizing the number of spatial scales mod_hv <- cf_lm_hv(y = y, x = x, coords = coords, add_learn = "none") ### Spatial modeling and prediction mod <- cf_lm(y = y, x = x, x0 = x0, coords = coords, coords0 = coords0, mod_hv = mod_hv) mod ### Mapping predictive mean and standard deviations (SD) meuse.grid$pred <- mod$pred0$pred meuse.grid$pred_sd<- mod$pred0$pred_sd meuse.grid_sf <- st_as_sf(meuse.grid, coords = c("x","y")) plot(meuse.grid_sf[,"pred"], pch = 15, cex = 0.5, nbreaks = 20) # Predictive mean plot(meuse.grid_sf[,"pred_sd"], pch = 15, cex = 0.5, nbreaks = 20)# Predictive SD ### Multiscale spatial pattern/feature extraction mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth) mod_s2<- sp_scalewise(mod,bw_range=c(500,1000)) # Middle scale (500 <= bandwidth <= 1000) mod_s3<- sp_scalewise(mod,bw_range=c(0,500)) # Small scale (bandwidth <= 500) z1 <- mod_s1$pred0$pred # Predictive mean z2 <- mod_s2$pred0$pred z3 <- mod_s3$pred0$pred z1_sd <- mod_s1$pred0$pred_sd # Predictive SD z2_sd <- mod_s2$pred0$pred_sd z3_sd <- mod_s3$pred0$pred_sd meuse.grid_sf3 <- cbind(meuse.grid_sf, z1, z2, z3, z1_sd, z2_sd, z3_sd) plot(meuse.grid_sf3[,c("z1","z2","z3")], pch = 15, cex = 0.5, nbreaks = 20,key.pos=4,axes=TRUE) # Predictive means plot(meuse.grid_sf3[,c("z1_sd","z2_sd","z3_sd")], pch = 15, cex = 0.5, nbreaks = 20,key.pos=4,axes=TRUE) # Predictive SDset.seed(123) require(sp); require(sf) data(meuse) data(meuse.grid) ### Data y <- log(meuse[,"zinc"]) coords <- meuse[,c("x","y")] x <- data.frame(dist = meuse[,"dist"], ffreq2 = as.integer(meuse$ffreq == 2), ffreq3 = as.integer(meuse$ffreq == 3)) ### Data at prediction sites coords0 <- meuse.grid[,c("x","y")] x0 <- data.frame(dist = meuse.grid[,"dist"], ffreq2 = as.integer(meuse.grid$ffreq == 2), ffreq3 = as.integer(meuse.grid$ffreq == 3)) ### Holdout validation optimizing the number of spatial scales mod_hv <- cf_lm_hv(y = y, x = x, coords = coords, add_learn = "none") ### Spatial modeling and prediction mod <- cf_lm(y = y, x = x, x0 = x0, coords = coords, coords0 = coords0, mod_hv = mod_hv) mod ### Mapping predictive mean and standard deviations (SD) meuse.grid$pred <- mod$pred0$pred meuse.grid$pred_sd<- mod$pred0$pred_sd meuse.grid_sf <- st_as_sf(meuse.grid, coords = c("x","y")) plot(meuse.grid_sf[,"pred"], pch = 15, cex = 0.5, nbreaks = 20) # Predictive mean plot(meuse.grid_sf[,"pred_sd"], pch = 15, cex = 0.5, nbreaks = 20)# Predictive SD ### Multiscale spatial pattern/feature extraction mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth) mod_s2<- sp_scalewise(mod,bw_range=c(500,1000)) # Middle scale (500 <= bandwidth <= 1000) mod_s3<- sp_scalewise(mod,bw_range=c(0,500)) # Small scale (bandwidth <= 500) z1 <- mod_s1$pred0$pred # Predictive mean z2 <- mod_s2$pred0$pred z3 <- mod_s3$pred0$pred z1_sd <- mod_s1$pred0$pred_sd # Predictive SD z2_sd <- mod_s2$pred0$pred_sd z3_sd <- mod_s3$pred0$pred_sd meuse.grid_sf3 <- cbind(meuse.grid_sf, z1, z2, z3, z1_sd, z2_sd, z3_sd) plot(meuse.grid_sf3[,c("z1","z2","z3")], pch = 15, cex = 0.5, nbreaks = 20,key.pos=4,axes=TRUE) # Predictive means plot(meuse.grid_sf3[,c("z1_sd","z2_sd","z3_sd")], pch = 15, cex = 0.5, nbreaks = 20,key.pos=4,axes=TRUE) # Predictive SD
Trains the CFSM-based Gaussian spatial regression and optimizes the number of spatial scales through sequential holdout validation.
cf_lm_hv( y, x = NULL, coords, train_rat = 0.75, id_train = NULL, alpha = 0.9, kernel = "exp", add_learn = "none", seed = 123 )cf_lm_hv( y, x = NULL, coords, train_rat = 0.75, id_train = NULL, alpha = 0.9, kernel = "exp", add_learn = "none", seed = 123 )
y |
Vector of response variables (N x 1). |
x |
Matrix of covariates (N x K). |
coords |
Matrix of 2-dimensional point coordinates (N x 2). |
train_rat |
Training sample ratio (default: 0.75). For small to moderate samples (N <= 30000), samples closest to the k-means centers are used for validation samples. For larger samples, training samples are drawn at random. |
id_train |
Optional. If specified, the corresponding samples are used as training samples. Otherwise, training samples are chosen based on 'train_rat'. |
alpha |
Decay ratio of the kernel bandwidth in the coarse-to-fine training (default: 0.9). As it approaches one, the optimization becomes more stringent but requires longer computation time. |
kernel |
Kernel type for modeling spatial dependence. '"exp"' for the exponential kernel (default) and '"gau"' for the Gaussian kernel. |
add_learn |
If '"rf"', random forest is additionally trained to capture non-linear patterns and/or higher-order interactions. Default is '"none"', meaning no additional training. |
seed |
Random seed used for the training/validation split when 'id_train' is not supplied. Defaults to '123', which makes the split reproducible across calls. Set to 'NULL' to allow each call to draw a different split (useful for assessing sensitivity to the split). |
A list with the following elements:
Sum-of-squared error (SSE) for validation samples.
All the SSEs obtained in each learning step.
ID of training samples.
List of other outcomes, which are internally used.
Daisuke Murakami
Murakami, D., Comber, A., Yoshida, T., Tsutsumida, N., Brunsdon, C., & Nakaya, T. (2025). Coarse-to-fine spatial GLMMs for scalable prediction and multiscale analysis. *ArXiv*.
Evaluate mean and variance of the spatial process with bandwidth values within a pre-specified range
sp_scalewise(mod, bw_range = c(0, Inf))sp_scalewise(mod, bw_range = c(0, Inf))
mod |
|
bw_range |
Range of bandwidth values of the simulated spatial processes. For example, if bw_range = c(10, 20), spatial processes with bandwidths between 10 and 20 are synthesized and simulated. The default is c(0, Inf), which synthesizes all scales. |
A list with the following elements:
Means and standard deviations of the spatial process (sample sites).
Means and standard deviations of the spatial process
(prediction sites). NULL when mod was fitted without
prediction sites.
Daisuke Murakami