gp_train — gp_train • gpss

to train GP model with training data set

Usage

gp_train(
  X,
  Y,
  b = NULL,
  s2 = 0.3,
  optimize = FALSE,
  scale = TRUE,
  mixed_data = FALSE,
  cat_columns = NULL,
  Xtest = NULL
)

Arguments

X: a set of covariate data frame or matrix
Y: Y vector (outcome variable)
b: bandwidth (default = NULL)
s2: noise or a fraction of Y not explained by X (default = 0.3)
optimize: a logical value to indicate whether an automatic optimized value of S2 should be used. If FALSE, users must define s2. (default = FALSE)
scale: a logical value to indicate whether covariates should be scaled. (dafault = TRUE)
mixed_data: a logical value to indicate whether the covariates contain a categorical/binary variable (default = FALSE)
cat_columns: a character or a numerical vector indicating categorical variables (default = NULL)
Xtest: a data frame or a matrix of testing covariates. This is necessary when a non-overlapping categorical value exists between training and testing data sets. (default = NULL)

Value

post_mean_scaled: posterior distribution of Y in a scaled form
post_mean_orig: posterior distribution of Y in an original scale
post_cov_scaled: posterior covariance matrix in a scaled form
post_cov_orig: posterior covariance matrix in an original scale
K: a kernel matrix of X
prior_mean_scaled: prior distribution of mean in a scaled form
X.orig: the original matrix or data set of X
X.init: the original matrix or data set of X with categorical variables in an expanded form
X.init.mean: the initial mean values of X
X.init.sd: the initial standard deviation values of X
Y.init.mean: the initial mean value of Y
Y.init.sd: the initial standard deviation value of Y
K: the kernel matrix of X
Y: scaled Y
X: scaled X
b: bandwidth
s2: sigma squared
alpha: alpha value in Rasmussen and Williams (2006) p.19
L: L value in Rasmussen and Williams (2006) p.19
mixed_data: a logical value indicating whether X contains a categorical/binary variable
cat_columns: a character or a numerical vector indicating the location of categorical/binary variables in X
cat_num: a numerical vector indicating the location of categorical/binary variables in an expanded version of X
Xcolnames: column names of X

Examples

data(lalonde)
cat_vars <- c("race_ethnicity", "married")
all_vars <- c("age","educ","re74","re75","married", "race_ethnicity")

X <- lalonde[,all_vars]
Y <- lalonde$re78
D <- lalonde$nsw

X_train <- X[D==0,]
Y_train <- Y[D==0]
X_test <- X[D == 1,]
Y_test <- Y[D == 1]

gp_train.out <- gp_train(X = X_train, Y = Y_train, optimize=TRUE, mixed_data = TRUE, cat_columns = cat_vars)
gp_predict.out <- gp_predict(gp_train.out, X_test)