Supervised Learning =================== This vignette provides a quick overview (using simulated data) of how to use ``stochtree`` for supervised learning. Start by loading stochtree's ``BARTModel`` class and a number of other packages. .. code-block:: python import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from stochtree import BARTModel from sklearn.model_selection import train_test_split Now, we generate a simulated prediction problem .. code-block:: python # RNG random_seed = 1234 rng = np.random.default_rng(random_seed) # Generate covariates and basis n = 1000 p_X = 10 p_W = 1 X = rng.uniform(0, 1, (n, p_X)) W = rng.uniform(0, 1, (n, p_W)) # Define the outcome mean function def outcome_mean(X, W): return np.where( (X[:,0] >= 0.0) & (X[:,0] < 0.25), -7.5 * W[:,0], np.where( (X[:,0] >= 0.25) & (X[:,0] < 0.5), -2.5 * W[:,0], np.where( (X[:,0] >= 0.5) & (X[:,0] < 0.75), 2.5 * W[:,0], 7.5 * W[:,0] ) ) ) # Generate outcome epsilon = rng.normal(0, 1, n) y = outcome_mean(X, W) + epsilon # Standardize outcome y_bar = np.mean(y) y_std = np.std(y) resid = (y-y_bar)/y_std Split the dataset into train and test sets .. code-block:: python sample_inds = np.arange(n) train_inds, test_inds = train_test_split(sample_inds, test_size=0.5) X_train = X[train_inds,:] X_test = X[test_inds,:] basis_train = W[train_inds,:] basis_test = W[test_inds,:] y_train = y[train_inds] y_test = y[test_inds] Initialize and run a BART sampler for 100 iterations (after 10 "warm-start" draws) .. code-block:: python bart_model = BARTModel() bart_model.sample(X_train=X_train, y_train=y_train, basis_train=basis_train, X_test=X_test, basis_test=basis_test, num_gfr=10, num_mcmc=100)