Class that provides functionality for statistical kernel definition and computation based on shared leaf membership of observations in a tree ensemble.
ForestKernel.Rd
Computes leaf membership internally as a sparse matrix and also calculates a (dense) kernel based on the sparse matrix all in C++.
Methods
Method compute_leaf_indices()
Compute the leaf indices of each tree in the ensemble for every observation in a dataset.
Stores the result internally, which can be extracted from the class via a call to get_leaf_indices
.
Usage
ForestKernel$compute_leaf_indices(
covariates_train,
covariates_test = NULL,
forest_container,
forest_num
)
Method compute_kernel()
Compute the kernel implied by a tree ensemble. This function calls compute_leaf_indices
,
so it is not necessary to call both. compute_leaf_indices
is exposed at the class level
to allow for extracting the vector of leaf indices for an ensemble directly in R.
Usage
ForestKernel$compute_kernel(
covariates_train,
covariates_test = NULL,
forest_container,
forest_num
)
Arguments
covariates_train
Matrix of training set covariates at which to assess ensemble kernel
covariates_test
(Optional) Matrix of test set covariates at which to assess ensemble kernel
forest_container
Object of type
ForestSamples
forest_num
Index of the forest in forest_container to be assessed
Returns
List of matrices. If covariates_test = NULL
, the list contains
one n_train
x n_train
matrix, where n_train = nrow(covariates_train)
.
This matrix is the kernel defined by W_train %*% t(W_train)
where W_train
is a matrix with n_train
rows and as many columns as there are total leaves in an ensemble.
If covariates_test
is not NULL
, the list contains two more matrices defined by
W_test %*% t(W_train)
and W_test %*% t(W_test)
.