Title: | Partially Additive (Generalized) Linear Model Trees |
---|---|
Description: | This is an implementation of model-based trees with global model parameters (PALM trees). The PALM tree algorithm is an extension to the MOB algorithm (implemented in the 'partykit' package), where some parameters are fixed across all groups. Details about the method can be found in Seibold, Hothorn, Zeileis (2016) <arXiv:1612.07498>. The package offers coef(), logLik(), plot(), and predict() functions for PALM trees. |
Authors: | Heidi Seibold [aut, cre] , Torsten Hothorn [aut] , Achim Zeileis [aut] |
Maintainer: | Heidi Seibold <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.9-1 |
Built: | 2024-11-04 05:04:15 UTC |
Source: | https://github.com/cran/palmtree |
Model-based recursive partitioning based on (generalized) linear models with some local (i.e., leaf-specific) and some global (i.e., constant throughout the tree) regression coefficients.
palmtree(formula, data, weights = NULL, family = NULL, lmstart = NULL, abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, plot = FALSE, ...)
palmtree(formula, data, weights = NULL, family = NULL, lmstart = NULL, abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, plot = FALSE, ...)
formula |
formula specifying the response variable and a three-part right-hand-side describing the local (i.e., leaf-specific) regressors, the global regressors (i.e., with constant coefficients throughout the tree), and partitioning variables, respectively. For details see below. |
data |
data.frame to be used for estimating the model tree. |
weights |
numeric. An optional numeric vector of weights. (Note that
this is passed with standard evaluation, i.e., it is not enough to pass
the name of a column in |
family |
either |
lmstart |
numeric. A vector of length |
abstol |
numeric. The convergence criterion used for estimation of the model.
When the difference in log-likelihoods of the model from two consecutive
iterations is smaller than |
maxit |
numeric. The maximum number of iterations to be performed in estimation of the model tree. |
dfsplit |
logical or numeric. |
verbose |
Should the log-likelihood value of the estimated model be printed for every iteration of the estimation? |
plot |
Should the tree be plotted at every iteration of the estimation? Note that selecting this option slows down execution of the function. |
... |
Additional arguments to be passed to |
Partially additive (generalized) linear model (PALM) trees learn a tree where each terminal node is associated with different regression coefficients while adjusting for additional global regression effects. This allows for detection of subgroup-specific coefficients with respect to selected covariates, while keeping the remaining regression coefficients constant throughout the tree. The estimation algorithm iterates between (1) estimation of the tree given an offset of the global effects, and (2) estimation of the global regression effects given the tree structure.
To specify all variables in the model a formula
such as
y ~ x1 + x2 | x3 | z1 + z2 + z3
is used, where y
is the
response, x1
and x2
are the regressors in every node of the
tree, x3
has a global regression coefficients, and z1
to z3
are the partitioning variables considered for growing the tree.
The code is still under development and might change in future versions.
The function returns a list with the following objects:
formula |
The formula as specified with the |
call |
the matched call. |
tree |
The final |
palm |
The final |
data |
The dataset specified with the |
nobs |
Number of observations. |
loglik |
The log-likelihood value of the last iteration. |
df |
Degrees of freedom. |
dfsplit |
degrees of freedom per selected split as specified with the |
iterations |
The number of iterations used to estimate the |
maxit |
The maximum number of iterations specified with the |
lmstart |
Offset in estimation of the first tree as specified in the |
abstol |
The prespecified value for the change in log-likelihood to evaluate
convergence, as specified with the |
intercept |
Logical specifying if an intercept was computed. |
family |
The |
mob.control |
A list containing control parameters passed to
|
Sies A, Van Mechelen I (2015). Comparing Four Methods for Estimating Tree-Based Treatment Regimes. Unpublished Manuscript.
## one DGP from Sies and Van Mechelen (2015) dgp <- function(nobs = 1000, nreg = 5, creg = 0.4, ptreat = 0.5, sd = 1, coef = c(1, 0.25, 0.25, 0, 0, -0.25), eff = 1) { d <- mvtnorm::rmvnorm(nobs, mean = rep(0, nreg), sigma = diag(1 - creg, nreg) + creg) colnames(d) <- paste0("x", 1:nreg) d <- as.data.frame(d) d$a <- rbinom(nobs, size = 1, prob = ptreat) d$err <- rnorm(nobs, mean = 0, sd = sd) gopt <- function(d) { as.numeric(d$x1 > -0.545) * as.numeric(d$x2 < 0.545) } d$y <- coef[1] + drop(as.matrix(d[, paste0("x", 1:5)]) %*% coef[-1]) - eff * (d$a - gopt(d))^2 + d$err d$a <- factor(d$a) return(d) } set.seed(1) d <- dgp() ## estimate PALM tree with correctly specified global (partially ## additive) regressors and all variables considered for partitioning palm <- palmtree(y ~ a | x1 + x2 + x5 | x1 + x2 + x3 + x4 + x5, data = d) print(palm) plot(palm) ## query coefficients coef(palm, model = "tree") coef(palm, model = "palm") coef(palm, model = "all")
## one DGP from Sies and Van Mechelen (2015) dgp <- function(nobs = 1000, nreg = 5, creg = 0.4, ptreat = 0.5, sd = 1, coef = c(1, 0.25, 0.25, 0, 0, -0.25), eff = 1) { d <- mvtnorm::rmvnorm(nobs, mean = rep(0, nreg), sigma = diag(1 - creg, nreg) + creg) colnames(d) <- paste0("x", 1:nreg) d <- as.data.frame(d) d$a <- rbinom(nobs, size = 1, prob = ptreat) d$err <- rnorm(nobs, mean = 0, sd = sd) gopt <- function(d) { as.numeric(d$x1 > -0.545) * as.numeric(d$x2 < 0.545) } d$y <- coef[1] + drop(as.matrix(d[, paste0("x", 1:5)]) %*% coef[-1]) - eff * (d$a - gopt(d))^2 + d$err d$a <- factor(d$a) return(d) } set.seed(1) d <- dgp() ## estimate PALM tree with correctly specified global (partially ## additive) regressors and all variables considered for partitioning palm <- palmtree(y ~ a | x1 + x2 + x5 | x1 + x2 + x3 + x4 + x5, data = d) print(palm) plot(palm) ## query coefficients coef(palm, model = "tree") coef(palm, model = "palm") coef(palm, model = "all")