Run the Bayesian Causal Forest (BCF) algorithm for regularized causal effect estimation.
bcf.Rd
Run the Bayesian Causal Forest (BCF) algorithm for regularized causal effect estimation.
Usage
bcf(
X_train,
Z_train,
y_train,
pi_train = NULL,
group_ids_train = NULL,
rfx_basis_train = NULL,
X_test = NULL,
Z_test = NULL,
pi_test = NULL,
group_ids_test = NULL,
rfx_basis_test = NULL,
cutpoint_grid_size = 100,
sigma_leaf_mu = NULL,
sigma_leaf_tau = NULL,
alpha_mu = 0.95,
alpha_tau = 0.25,
beta_mu = 2,
beta_tau = 3,
min_samples_leaf_mu = 5,
min_samples_leaf_tau = 5,
max_depth_mu = 10,
max_depth_tau = 5,
a_global = 0,
b_global = 0,
a_leaf_mu = 3,
a_leaf_tau = 3,
b_leaf_mu = NULL,
b_leaf_tau = NULL,
q = 0.9,
sigma2 = NULL,
pct_var_sigma2_init = 0.25,
variable_weights = NULL,
keep_vars_mu = NULL,
drop_vars_mu = NULL,
keep_vars_tau = NULL,
drop_vars_tau = NULL,
num_trees_mu = 250,
num_trees_tau = 50,
num_gfr = 5,
num_burnin = 0,
num_mcmc = 100,
sample_sigma_global = T,
sample_sigma_leaf_mu = T,
sample_sigma_leaf_tau = F,
propensity_covariate = "mu",
adaptive_coding = T,
b_0 = -0.5,
b_1 = 0.5,
rfx_prior_var = NULL,
random_seed = -1,
keep_burnin = F,
keep_gfr = F,
verbose = F
)
Arguments
- X_train
Covariates used to split trees in the ensemble. May be provided either as a dataframe or a matrix. Matrix covariates will be assumed to be all numeric. Covariates passed as a dataframe will be preprocessed based on the variable types (e.g. categorical columns stored as unordered factors will be one-hot encoded, categorical columns stored as ordered factors will passed as integers to the core algorithm, along with the metadata that the column is ordered categorical).
- Z_train
Vector of (continuous or binary) treatment assignments.
- y_train
Outcome to be modeled by the ensemble.
- pi_train
(Optional) Vector of propensity scores. If not provided, this will be estimated from the data.
- group_ids_train
(Optional) Group labels used for an additive random effects model.
- rfx_basis_train
(Optional) Basis for "random-slope" regression in an additive random effects model. If
group_ids_train
is provided with a regression basis, an intercept-only random effects model will be estimated.- X_test
(Optional) Test set of covariates used to define "out of sample" evaluation data. May be provided either as a dataframe or a matrix, but the format of
X_test
must be consistent with that ofX_train
.- Z_test
(Optional) Test set of (continuous or binary) treatment assignments.
- pi_test
(Optional) Vector of propensity scores. If not provided, this will be estimated from the data.
- group_ids_test
(Optional) Test set group labels used for an additive random effects model. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set.
- rfx_basis_test
(Optional) Test set basis for "random-slope" regression in additive random effects model.
- cutpoint_grid_size
Maximum size of the "grid" of potential cutpoints to consider. Default: 100.
- sigma_leaf_mu
Starting value of leaf node scale parameter for the prognostic forest. Calibrated internally as
2/num_trees_mu
if not set here.- sigma_leaf_tau
Starting value of leaf node scale parameter for the treatment effect forest. Calibrated internally as
1/num_trees_tau
if not set here.- alpha_mu
Prior probability of splitting for a tree of depth 0 for the prognostic forest. Tree split prior combines
alpha
andbeta
viaalpha*(1+node_depth)^-beta
. Default: 0.95.- alpha_tau
Prior probability of splitting for a tree of depth 0 for the treatment effect forest. Tree split prior combines
alpha
andbeta
viaalpha*(1+node_depth)^-beta
. Default: 0.25.- beta_mu
Exponent that decreases split probabilities for nodes of depth > 0 for the prognostic forest. Tree split prior combines
alpha
andbeta
viaalpha*(1+node_depth)^-beta
. Default: 2.0.- beta_tau
Exponent that decreases split probabilities for nodes of depth > 0 for the treatment effect forest. Tree split prior combines
alpha
andbeta
viaalpha*(1+node_depth)^-beta
. Default: 3.0.- min_samples_leaf_mu
Minimum allowable size of a leaf, in terms of training samples, for the prognostic forest. Default: 5.
- min_samples_leaf_tau
Minimum allowable size of a leaf, in terms of training samples, for the treatment effect forest. Default: 5.
- max_depth_mu
Maximum depth of any tree in the mu ensemble. Default: 10. Can be overriden with
-1
which does not enforce any depth limits on trees.- max_depth_tau
Maximum depth of any tree in the tau ensemble. Default: 5. Can be overriden with
-1
which does not enforce any depth limits on trees.- a_global
Shape parameter in the
IG(a_global, b_global)
global error variance model. Default: 0.- b_global
Scale parameter in the
IG(a_global, b_global)
global error variance model. Default: 0.- a_leaf_mu
Shape parameter in the
IG(a_leaf, b_leaf)
leaf node parameter variance model for the prognostic forest. Default: 3.- a_leaf_tau
Shape parameter in the
IG(a_leaf, b_leaf)
leaf node parameter variance model for the treatment effect forest. Default: 3.- b_leaf_mu
Scale parameter in the
IG(a_leaf, b_leaf)
leaf node parameter variance model for the prognostic forest. Calibrated internally as 0.5/num_trees if not set here.- b_leaf_tau
Scale parameter in the
IG(a_leaf, b_leaf)
leaf node parameter variance model for the treatment effect forest. Calibrated internally as 0.5/num_trees if not set here.- q
Quantile used to calibrated
lambda
as in Sparapani et al (2021). Default: 0.9.- sigma2
Starting value of global error variance parameter. Calibrated internally as
pct_var_sigma2_init*var((y-mean(y))/sd(y))
if not set.- pct_var_sigma2_init
Percentage of standardized outcome variance used to initialize global error variance parameter. Default: 0.25. Superseded by
sigma2
.- variable_weights
Numeric weights reflecting the relative probability of splitting on each variable. Does not need to sum to 1 but cannot be negative. Defaults to
rep(1/ncol(X_train), ncol(X_train))
if not set here. Note that if the propensity score is included as a covariate in either forest, its weight will default to1/ncol(X_train)
. A workaround if you wish to provide a custom weight for the propensity score is to include it as a column inX_train
and then setpropensity_covariate
to'none'
adjustkeep_vars_mu
andkeep_vars_tau
accordingly.- keep_vars_mu
Vector of variable names or column indices denoting variables that should be included in the prognostic (
mu(X)
) forest. Default: NULL.- drop_vars_mu
Vector of variable names or column indices denoting variables that should be excluded from the prognostic (
mu(X)
) forest. Default: NULL. If bothdrop_vars_mu
andkeep_vars_mu
are set,drop_vars_mu
will be ignored.- keep_vars_tau
Vector of variable names or column indices denoting variables that should be included in the treatment effect (
tau(X)
) forest. Default: NULL.- drop_vars_tau
Vector of variable names or column indices denoting variables that should be excluded from the treatment effect (
tau(X)
) forest. Default: NULL. If bothdrop_vars_tau
andkeep_vars_tau
are set,drop_vars_tau
will be ignored.- num_trees_mu
Number of trees in the prognostic forest. Default: 200.
- num_trees_tau
Number of trees in the treatment effect forest. Default: 50.
- num_gfr
Number of "warm-start" iterations run using the grow-from-root algorithm (He and Hahn, 2021). Default: 5.
- num_burnin
Number of "burn-in" iterations of the MCMC sampler. Default: 0.
- num_mcmc
Number of "retained" iterations of the MCMC sampler. Default: 100.
- sample_sigma_global
Whether or not to update the
sigma^2
global error variance parameter based onIG(nu, nu*lambda)
. Default: T.- sample_sigma_leaf_mu
Whether or not to update the
sigma_leaf_mu
leaf scale variance parameter in the prognostic forest based onIG(a_leaf_mu, b_leaf_mu)
. Default: T.- sample_sigma_leaf_tau
Whether or not to update the
sigma_leaf_tau
leaf scale variance parameter in the treatment effect forest based onIG(a_leaf_tau, b_leaf_tau)
. Default: T.- propensity_covariate
Whether to include the propensity score as a covariate in either or both of the forests. Enter "none" for neither, "mu" for the prognostic forest, "tau" for the treatment forest, and "both" for both forests. If this is not "none" and a propensity score is not provided, it will be estimated from (
X_train
,Z_train
) usingstochtree::bart()
. Default: "mu".- adaptive_coding
Whether or not to use an "adaptive coding" scheme in which a binary treatment variable is not coded manually as (0,1) or (-1,1) but learned via parameters
b_0
andb_1
that attach to the outcome model[b_0 (1-Z) + b_1 Z] tau(X)
. This is ignored when Z is not binary. Default: T.- b_0
Initial value of the "control" group coding parameter. This is ignored when Z is not binary. Default: -0.5.
- b_1
Initial value of the "treatment" group coding parameter. This is ignored when Z is not binary. Default: 0.5.
- rfx_prior_var
Prior on the (diagonals of the) covariance of the additive group-level random regression coefficients. Must be a vector of length
ncol(rfx_basis_train)
. Default:rep(1, ncol(rfx_basis_train))
- random_seed
Integer parameterizing the C++ random number generator. If not specified, the C++ random number generator is seeded according to
std::random_device
.- keep_burnin
Whether or not "burnin" samples should be included in cached predictions. Default FALSE. Ignored if num_mcmc = 0.
- keep_gfr
Whether or not "grow-from-root" samples should be included in cached predictions. Default FALSE. Ignored if num_mcmc = 0.
- verbose
Whether or not to print progress during the sampling loops. Default: FALSE.
Value
List of sampling outputs and a wrapper around the sampled forests (which can be used for in-memory prediction on new data, or serialized to JSON on disk).
Examples
n <- 500
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
x4 <- as.numeric(rbinom(n,1,0.5))
x5 <- as.numeric(sample(1:3,n,replace=TRUE))
X <- cbind(x1,x2,x3,x4,x5)
p <- ncol(X)
g <- function(x) {ifelse(x[,5]==1,2,ifelse(x[,5]==2,-1,4))}
mu1 <- function(x) {1+g(x)+x[,1]*x[,3]}
mu2 <- function(x) {1+g(x)+6*abs(x[,3]-1)}
tau1 <- function(x) {rep(3,nrow(x))}
tau2 <- function(x) {1+2*x[,2]*x[,4]}
mu_x <- mu1(X)
tau_x <- tau2(X)
pi_x <- 0.8*pnorm((3*mu_x/sd(mu_x)) - 0.5*X[,1]) + 0.05 + runif(n)/10
Z <- rbinom(n,1,pi_x)
E_XZ <- mu_x + Z*tau_x
snr <- 4
y <- E_XZ + rnorm(n, 0, 1)*(sd(E_XZ)/snr)
X <- as.data.frame(X)
X$x4 <- factor(X$x4, ordered = TRUE)
X$x5 <- factor(X$x5, ordered = TRUE)
test_set_pct <- 0.2
n_test <- round(test_set_pct*n)
n_train <- n - n_test
test_inds <- sort(sample(1:n, n_test, replace = FALSE))
train_inds <- (1:n)[!((1:n) %in% test_inds)]
X_test <- X[test_inds,]
X_train <- X[train_inds,]
pi_test <- pi_x[test_inds]
pi_train <- pi_x[train_inds]
Z_test <- Z[test_inds]
Z_train <- Z[train_inds]
y_test <- y[test_inds]
y_train <- y[train_inds]
mu_test <- mu_x[test_inds]
mu_train <- mu_x[train_inds]
tau_test <- tau_x[test_inds]
tau_train <- tau_x[train_inds]
bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, pi_train = pi_train,
X_test = X_test, Z_test = Z_test, pi_test = pi_test)
# plot(rowMeans(bcf_model$mu_hat_test), mu_test, xlab = "predicted", ylab = "actual", main = "Prognostic function")
# abline(0,1,col="red",lty=3,lwd=3)
# plot(rowMeans(bcf_model$tau_hat_test), tau_test, xlab = "predicted", ylab = "actual", main = "Treatment effect")
# abline(0,1,col="red",lty=3,lwd=3)