Title: | Scalable Geographically Weighted Regression |
---|---|
Description: | Fast and regularized version of GWR for large dataset, detailed in Murakami, Tsutsumida, Yoshida, Nakaya, and Lu (2019) <arXiv:1905.00266>. |
Authors: | Daisuke Murakami[cre,aut], Narumasa Tsutsumida[ctb], Takahiro Yoshida[ctb], Tomoki Nakaya[ctb], Lu Binbin[ctb] |
Maintainer: | Daisuke Murakami <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.2-21 |
Built: | 2025-02-21 04:29:53 UTC |
Source: | https://github.com/cran/scgwr |
This function predicts explained variables and spatially varying coefficients at unobserved sites using the scalable GWR model.
predict0( mod, coords0, x0 = NULL )
predict0( mod, coords0, x0 = NULL )
mod |
Output from the scgwr function |
coords0 |
Matrix of spatial point coordinates at predicted sites (N0 x 2) |
x0 |
Matrix of explanatory variables at predicted sites (N0 x K). If NULL, explained variables are not predicted (only spatially varying coefficients are predicted). Default is NULL |
pred |
Vector of predicted values (N0 x 1) |
b |
Matrix of estimated coefficients (N0 x K) |
bse |
Matrix of the standard errors for the coefficients (N0 x k) |
t |
Matrix of the t-values for the coefficients (N0 x K) |
p |
Matrix of the p-values for the coefficients (N0 x K) |
require(spData) data(boston) id_obs <-sample(dim(boston.c)[1],400) ######################### data at observed sites y <- log(boston.c[id_obs,"MEDV"]) x <- boston.c[id_obs, c("CRIM", "INDUS","ZN","NOX","AGE")] coords <- boston.c[id_obs , c("LON", "LAT") ] ######################### data at predicted sites x0 <- boston.c[-id_obs, c("CRIM", "INDUS","ZN","NOX", "AGE")] coords0 <- boston.c[-id_obs , c("LON", "LAT") ] mod <- scgwr( coords = coords, y = y, x = x ) pred0 <- predict0( mod=mod, coords0=coords0, x0=x0) pred <- pred0$pred # predicted value b <- pred0$b # spatially varying coefficients b[1:5,] bse <- pred0$bse # standard error of the coefficients bt <- pred0$t # t-values bp <- pred0$p # p-values
require(spData) data(boston) id_obs <-sample(dim(boston.c)[1],400) ######################### data at observed sites y <- log(boston.c[id_obs,"MEDV"]) x <- boston.c[id_obs, c("CRIM", "INDUS","ZN","NOX","AGE")] coords <- boston.c[id_obs , c("LON", "LAT") ] ######################### data at predicted sites x0 <- boston.c[-id_obs, c("CRIM", "INDUS","ZN","NOX", "AGE")] coords0 <- boston.c[-id_obs , c("LON", "LAT") ] mod <- scgwr( coords = coords, y = y, x = x ) pred0 <- predict0( mod=mod, coords0=coords0, x0=x0) pred <- pred0$pred # predicted value b <- pred0$b # spatially varying coefficients b[1:5,] bse <- pred0$bse # standard error of the coefficients bt <- pred0$t # t-values bp <- pred0$p # p-values
This function estimates a scalable geographically weighted regression (GWR) model. See scgwr_p
for parallel implementqtion of the model for very large samples.
scgwr( coords, y, x = NULL, knn = 100, kernel = "gau", p = 4, approach = "CV", nsamp = NULL)
scgwr( coords, y, x = NULL, knn = 100, kernel = "gau", p = 4, approach = "CV", nsamp = NULL)
coords |
Matrix of spatial point coordinates (N x 2) |
y |
Vector of explained variables (N x 1) |
x |
Matrix of explanatory variables (N x K). Default is NULL |
knn |
Number of nearest-neighbors being geographically weighted. Default is 100. Larger knn is better for larger samples (see Murakami er al., 2019) |
kernel |
Kernel to model spatial heterogeneity. Gaussian kernel ("gau") and exponential kernel ("exp") are available |
p |
Degree of the polynomial to approximate the kernel function. Default is 4 |
approach |
If "CV", leave-one-out cross-validation is used for the model calibration. If "AICc", the corrected Akaike Information Criterion is minimized for the calibation. Default is "CV" |
nsamp |
Number of samples used to approximate the cross-validation. The samples are randomly selected. If the value is large enough (e.g., 10,000), error due to the random sampling is quite small owing to the central limit theorem. The value must be smaller than the sample size. Default is NULL |
b |
Matrix of estimated coefficients (N x K) |
bse |
Matrix of the standard errors for the coefficients (N x k) |
t |
Matrix of the t-values for the coefficients (N x K) |
p |
Matrix of the p-values for the coefficients (N x K) |
par |
Estimated model parameters includeing a scale parameter and a shrinkage parameter if penalty = TRUE (see Murakami et al., 2018) |
e |
Error statistics. It includes sum of squared errors (SSE), residual standard error (resid_SE), R-squared (R2), adjusted R2 (adjR2), log-likelihood (logLik), corrected Akaike information criterion (AICc), and the cross-validation (CV) score measured by root mean squared error (RMSE) (CV_score(RMSE)) |
pred |
Vector of predicted values (N x 1) |
resid |
Vector of residuals (N x 1) |
other |
Other objects internally used |
Murakami, D., Tsutsumida, N., Yoshida, T., Nakaya, T., and Lu, B. (2019) Scalable GWR: A linear-time algorithm for large-scale geographically weighted regression with polynomial kernels. <arXiv:1905.00266>.
require( spData ) data( boston ) coords <- boston.c[, c("LON", "LAT") ] y <- log(boston.c[,"MEDV"]) x <- boston.c[, c("CRIM", "ZN", "INDUS", "CHAS", "AGE")] res <- scgwr( coords = coords, y = y, x) res
require( spData ) data( boston ) coords <- boston.c[, c("LON", "LAT") ] y <- log(boston.c[,"MEDV"]) x <- boston.c[, c("CRIM", "ZN", "INDUS", "CHAS", "AGE")] res <- scgwr( coords = coords, y = y, x) res
Parallel implementation of scalable geographically weighted regression for large samples
scgwr_p( coords, y, x = NULL, knn = 100, kernel = "gau", p = 4, approach = "CV", nsamp = NULL, cl = NULL)
scgwr_p( coords, y, x = NULL, knn = 100, kernel = "gau", p = 4, approach = "CV", nsamp = NULL, cl = NULL)
coords |
Matrix of spatial point coordinates (N x 2) |
y |
Vector of explained variables (N x 1) |
x |
Matrix of explanatory variables (N x K). Default is NULL |
knn |
Number of nearest-neighbors being geographically weighted. Default is 100. Larger knn is better for larger samples (see Murakami er al., 2019) |
kernel |
Kernel to model spatial heterogeneity. Gaussian kernel ("gau") and exponential kernel ("exp") are available |
p |
Degree of the polynomial to approximate the kernel function. Default is 4 |
approach |
If "CV", leave-one-out cross-validation is used for the model calibration. If "AICc", the corrected Akaike Information Criterion is minimized for the calibation. Default is "CV" |
nsamp |
Number of samples used to approximate the cross-validation. The samples are randomly selected. If the value is large enough (e.g., 10,000), error due to the sampling is quite small owing to the central limit theorem. The value must be smaller than the sample size. Default is NULL |
cl |
Number of cores used for the parallel computation. If cl = NULL, which is the default, the number of available cores is detected and used |
b |
Matrix of estimated coefficients (N x K) |
bse |
Matrix of the standard errors for the coefficients (N x k) |
t |
Matrix of the t-values for the coefficients (N x K) |
p |
Matrix of the p-values for the coefficients (N x K) |
par |
Estimated model parameters includeing a scale parameter and a shrinkage parameter if penalty = TRUE (see Murakami et al., 2018) |
e |
Error statistics. It includes sum of squared errors (SSE), residual standard error (resid_SE), R-squared (R2), adjusted R2 (adjR2), log-likelihood (logLik), corrected Akaike information criterion (AICc), and the cross-validation (CV) score measured by root mean squared error (RMSE) (CV_score(RMSE)) |
pred |
Vector of predicted values (N x 1) |
resid |
Vector of residuals (N x 1) |
other |
Other objects internally used |
Murakami, D., Tsutsumida, N., Yoshida, T., Nakaya, T., and Lu, B. (2019) Scalable GWR: A linear-time algorithm for large-scale geographically weighted regression with polynomial kernels. <arXiv:1905.00266>.
# require(spData);require(sp) # data(house) # dat <- data.frame(coordinates(house), house@data[,c("price","age","rooms","beds","syear")]) # coords<- dat[ ,c("long","lat")] # y <- log(dat[,"price"]) # x <- dat[,c("age","rooms","beds","syear")] # Parallel estimation # res1 <- scgwr_p( coords = coords, y = y, x = x ) # res1 # Parallel estimation + Approximate cross-validation using 10000 samples # res2 <- scgwr_p( coords = coords, y = y, x = x, nsamp = 10000 ) # res2
# require(spData);require(sp) # data(house) # dat <- data.frame(coordinates(house), house@data[,c("price","age","rooms","beds","syear")]) # coords<- dat[ ,c("long","lat")] # y <- log(dat[,"price"]) # x <- dat[,c("age","rooms","beds","syear")] # Parallel estimation # res1 <- scgwr_p( coords = coords, y = y, x = x ) # res1 # Parallel estimation + Approximate cross-validation using 10000 samples # res2 <- scgwr_p( coords = coords, y = y, x = x, nsamp = 10000 ) # res2