to train GP model with training data set
Usage
gp_train(
X,
Y,
b = NULL,
s2 = 0.3,
optimize = FALSE,
scale = TRUE,
mixed_data = FALSE,
cat_columns = NULL,
Xtest = NULL
)
Arguments
- X
a set of covariate data frame or matrix
- Y
Y vector (outcome variable)
- b
bandwidth (default = NULL)
- s2
noise or a fraction of Y not explained by X (default = 0.3)
- optimize
a logical value to indicate whether an automatic optimized value of S2 should be used. If FALSE, users must define s2. (default = FALSE)
- scale
a logical value to indicate whether covariates should be scaled. (dafault = TRUE)
- mixed_data
a logical value to indicate whether the covariates contain a categorical/binary variable (default = FALSE)
- cat_columns
a character or a numerical vector indicating categorical variables (default = NULL)
- Xtest
a data frame or a matrix of testing covariates. This is necessary when a non-overlapping categorical value exists between training and testing data sets. (default = NULL)
Value
- post_mean_scaled
posterior distribution of Y in a scaled form
- post_mean_orig
posterior distribution of Y in an original scale
- post_cov_scaled
posterior covariance matrix in a scaled form
- post_cov_orig
posterior covariance matrix in an original scale
- K
a kernel matrix of X
- prior_mean_scaled
prior distribution of mean in a scaled form
- X.orig
the original matrix or data set of X
- X.init
the original matrix or data set of X with categorical variables in an expanded form
- X.init.mean
the initial mean values of X
- X.init.sd
the initial standard deviation values of X
- Y.init.mean
the initial mean value of Y
- Y.init.sd
the initial standard deviation value of Y
- K
the kernel matrix of X
- Y
scaled Y
- X
scaled X
- b
bandwidth
- s2
sigma squared
- alpha
alpha value in Rasmussen and Williams (2006) p.19
- L
L value in Rasmussen and Williams (2006) p.19
- mixed_data
a logical value indicating whether X contains a categorical/binary variable
- cat_columns
a character or a numerical vector indicating the location of categorical/binary variables in X
- cat_num
a numerical vector indicating the location of categorical/binary variables in an expanded version of X
- Xcolnames
column names of X
Examples
data(lalonde)
cat_vars <- c("race_ethnicity", "married")
all_vars <- c("age","educ","re74","re75","married", "race_ethnicity")
X <- lalonde[,all_vars]
Y <- lalonde$re78
D <- lalonde$nsw
X_train <- X[D==0,]
Y_train <- Y[D==0]
X_test <- X[D == 1,]
Y_test <- Y[D == 1]
gp_train.out <- gp_train(X = X_train, Y = Y_train, optimize=TRUE, mixed_data = TRUE, cat_columns = cat_vars)
gp_predict.out <- gp_predict(gp_train.out, X_test)