Tree

class survivors.tree.node.Node(df, numb=0, full_rule=[], depth=0, features=[], categ=[], woe=False, verbose=0, **info)

Node of decision tree. Allow to separate data into 2 child nodes

Attributes

dfPandas DataFrame: Data of the Node
numbint: Number or name of the Node
full_rulelist: Rules from the root
depthint: Distance from the root
edgesarray-like: Numbers of child nodes
rule_edgesarray-like: Rules for child nodes
featureslist: Available features
categlist: Names of categorical features
woeboolean: Mode of categorical preparation
is_leafboolean: True if node is terminal (there are no child nodes)
verboseint: Print the best split of the node
infodict: Parameters for finding the best splitting
leaf_modelLeafModel: Stratified model by parameter <<leaf_model>> (map with LEAF_MODEL_DICT)

Methods

check_params : Fill empty parameters and map max_features to int find_best_split : Choose best split of node according to parameters split : Find best split of df sample and create child nodes set_edges: Set number of child nodes from hash table of main tree set_leaf : Delete child nodes and set node as terminal

predict : Return statistic values of a data predict_scheme : Return all possible outcomes for additional features determination

prepare_df_for_attr : set input values to numpy format (and fill missing features) get_edges : defines appropriate child nodes according to input values get_full_rule : convert full_rules to a string format

get_figure : Create picture of a data (hist, survival function) get_description : Return common values of a data (size, depth, death, cens)

set_dot_node : add self node to the graphviz dot set_dot_edges : add child nodes to the graphviz dot translate : Replace rules and features by dictionary

find_best_split(): Sort through all combinations of splitting the sample by features. Find a split with the highest statistical value.

get_comb_fast(features): Create set of all triplets with two target variables and one splitting feature

get_description(lang='en')

Build description for graphviz node

Returns

labelstr: multiline string of description

get_figure(mode='hist', bins=None, target='cens', save_path='', lang='en')

Save background image for graphviz node

Parameters

targetstr

Described feature

modestring

“hist” plot histogram of target
“kde” plot smooth density of target
“surv” plot survival function (doesn’t use target)

binslist

Points for survival function

save_pathstr

External path to save image

ind_for_nodes(X_attr, best_split, is_categ): Map the number of the according child node by rule and sample features

predict(X, target, bins=None)

Predict target values for X data

Parameters

XPandas dataframe: Contain input features of events
targetstr or function: Column name, mode or aggregate function of a leaf sample Column name : must be in dataset.columns - Return mean of feature Mode : - “surv” return survival function - “hazard” return cumulative hazard function - attribute_name return attribute value (e.g. depth, numb) - feature_name return aggregate statistic value for node
binsarray-like: Points of timeline

Returns

resarray-like: Values by target

predict_scheme(X, scheme_feats)

Predict target values for X data

Parameters

XPandas dataframe: Contain input features of events
scheme_featslist: Needed features from node

Returns

resScheme: Aggregation entity of node information

set_dot_edges(dot): Set edges for graphviz by node structure

set_dot_node(dot, path_dir='', depth=None, lang='en', **args): Set node for graphviz dot with image and label

split(): Find best split of df sample and create child nodes

translate(describe): Rename features in node by dictionary

class survivors.tree.node.Rule(feature: str, condition: str, has_nan: int)

Node of decision tree. Allow to separate data into 2 child nodes

Attributes

featurestr: Name of feature for splitting
conditionstr: Operation for splitting
has_nanbool: Flag of the missing values in node

Methods

get_feature : Return feature get_condition : Return condition translate: Replace rule by dictionary to_str : Transforming to linear form print : Print all attributes and descriptions

print(): Print all attributes and descriptions

translate(describe: dict): Rename feature in rule

class survivors.tree.decision_tree.CRAID(depth=0, random_state=123, features=[], categ=[], cut=False, balance=None, **info)

Survival decision tree model.

Attributes

nodesdict: Dictionary of all tree’s nodes (numbers from hierarchy)
cutboolean: Flag of pruning
balanceboolean: Flag of source data balancing
depthint: Maximal depth of nodes
featureslist: Available features
categlist: Names of categorical features
random_stateint: Fixed seed for building reproducibility
namestr: Model’s name
binsarray-like: Points of timeline.
infodict: Parameters for building nodes

Methods

fit : build decision tree with X, y data (iterative splitting node) predict : return values of features, rules or schemes predict_at_times : return survival or hazard function predict_schemes : return FilledSchemeStrategy or Scheme cut_tree : pruning function

visualize : build graphviz Digraph for each node translate : Replace rules and features by dictionary

get_leaf_numbers : return leaf numbers from nodes get_spanning_leaf_numbers : return pre-leaves numbers from nodes delete_leaves_by_span : set up pre-leaves from lists to leaves

cut_tree(X, target, mode_f=<function roc_auc_score>, choose_f=<built-in function max>)

Method of pruning tree. Find the best subtree that achieves the best value of the “mode_f” metric”.

Parameters

XPandas dataframe: Contain input features of events.
targetstr: Feature name for metric counting.
mode_ffunction, optional: Metric for selecting. The default is roc_auc_score.
choose_ffunction, optional: Type of best value (max or min). The default is max.

delete_leaves_by_span(list_span_leaf): Set pre-termination nodes as leaves

get_spanning_leaf_numbers(): Get pre-termination nodes (have two leaves in edges)

predict(X, mode='target', target='time', end_list=[], bins=None)

Return values by mode & target

Parameters

XPandas dataframe: Contain input features of events.
modestr, optional: Mode of predicting. The default is “target”. “surv” : return values of survival function in bins “hazard” : return values of hazard function in bins “target” : return values of feature (in target variable) “rules” : return full rules from node to leaf
targetstr or list, optional: An aim of predicting. The default is occurred time.
end_listlist, optional: Numbers of endpoint nodes (for cutting)
binsarray-like, optional: Points of timeline

Returns

resarray-like: Values by mode & target

predict_at_times(X, bins, mode='surv')

Return survival or hazard function.

Parameters

XPandas dataframe: Contain input features of events.
binsarray-like: Points of timeline.
modestr, optional: Type of function. The default is “surv”. “surv” : send building function in nodes “hazard” : send building function in nodes

Returns

resarray-like: Vector of function values in times (bins)

translate(describe): Rename features for each node by dictionary

visualize(path_dir=None, **kwargs): Build graphviz representation of the tree

Parameters

path_dir : str kwargs : dict

Keyword arguments for nodes

Returns

dot : graphviz.Digraph