Tree

class survivors.tree.node.Node(df, numb=0, full_rule=[], depth=0, features=[], categ=[], woe=False, verbose=0, **info)

Node of decision tree. Allow to separate data into 2 child nodes

Attributes

dfPandas DataFrame

Data of the Node

numbint

Number or name of the Node

full_rulelist

Rules from the root

depthint

Distance from the root

edgesarray-like

Numbers of child nodes

rule_edgesarray-like

Rules for child nodes

featureslist

Available features

categlist

Names of categorical features

woeboolean

Mode of categorical preparation

is_leafboolean

True if node is terminal (there are no child nodes)

verboseint

Print the best split of the node

infodict

Parameters for finding the best splitting

leaf_modelLeafModel

Stratified model by parameter <<leaf_model>> (map with LEAF_MODEL_DICT)

Methods

check_params : Fill empty parameters and map max_features to int find_best_split : Choose best split of node according to parameters split : Find best split of df sample and create child nodes set_edges: Set number of child nodes from hash table of main tree set_leaf : Delete child nodes and set node as terminal

predict : Return statistic values of a data predict_scheme : Return all possible outcomes for additional features determination

prepare_df_for_attr : set input values to numpy format (and fill missing features) get_edges : defines appropriate child nodes according to input values get_full_rule : convert full_rules to a string format

get_figure : Create picture of a data (hist, survival function) get_description : Return common values of a data (size, depth, death, cens)

set_dot_node : add self node to the graphviz dot set_dot_edges : add child nodes to the graphviz dot translate : Replace rules and features by dictionary

find_best_split()

Sort through all combinations of splitting the sample by features. Find a split with the highest statistical value.

get_comb_fast(features)

Create set of all triplets with two target variables and one splitting feature

get_description(lang='en')

Build description for graphviz node

Returns

labelstr

multiline string of description

get_figure(mode='hist', bins=None, target='cens', save_path='', lang='en')

Save background image for graphviz node

Parameters

targetstr

Described feature

modestring
  • “hist” plot histogram of target

  • “kde” plot smooth density of target

  • “surv” plot survival function (doesn’t use target)

binslist

Points for survival function

save_pathstr

External path to save image

ind_for_nodes(X_attr, best_split, is_categ)

Map the number of the according child node by rule and sample features

predict(X, target, bins=None)

Predict target values for X data

Parameters

XPandas dataframe

Contain input features of events

targetstr or function

Column name, mode or aggregate function of a leaf sample Column name : must be in dataset.columns - Return mean of feature Mode : - “surv” return survival function - “hazard” return cumulative hazard function - attribute_name return attribute value (e.g. depth, numb) - feature_name return aggregate statistic value for node

binsarray-like

Points of timeline

Returns

resarray-like

Values by target

predict_scheme(X, scheme_feats)

Predict target values for X data

Parameters

XPandas dataframe

Contain input features of events

scheme_featslist

Needed features from node

Returns

resScheme

Aggregation entity of node information

set_dot_edges(dot)

Set edges for graphviz by node structure

set_dot_node(dot, path_dir='', depth=None, lang='en', **args)

Set node for graphviz dot with image and label

split()

Find best split of df sample and create child nodes

translate(describe)

Rename features in node by dictionary

class survivors.tree.node.Rule(feature: str, condition: str, has_nan: int)

Node of decision tree. Allow to separate data into 2 child nodes

Attributes

featurestr

Name of feature for splitting

conditionstr

Operation for splitting

has_nanbool

Flag of the missing values in node

Methods

get_feature : Return feature get_condition : Return condition translate: Replace rule by dictionary to_str : Transforming to linear form print : Print all attributes and descriptions

print()

Print all attributes and descriptions

translate(describe: dict)

Rename feature in rule

class survivors.tree.decision_tree.CRAID(depth=0, random_state=123, features=[], categ=[], cut=False, balance=None, **info)

Survival decision tree model.

Attributes

nodesdict

Dictionary of all tree’s nodes (numbers from hierarchy)

cutboolean

Flag of pruning

balanceboolean

Flag of source data balancing

depthint

Maximal depth of nodes

featureslist

Available features

categlist

Names of categorical features

random_stateint

Fixed seed for building reproducibility

namestr

Model’s name

binsarray-like

Points of timeline.

infodict

Parameters for building nodes

Methods

fit : build decision tree with X, y data (iterative splitting node) predict : return values of features, rules or schemes predict_at_times : return survival or hazard function predict_schemes : return FilledSchemeStrategy or Scheme cut_tree : pruning function

visualize : build graphviz Digraph for each node translate : Replace rules and features by dictionary

get_leaf_numbers : return leaf numbers from nodes get_spanning_leaf_numbers : return pre-leaves numbers from nodes delete_leaves_by_span : set up pre-leaves from lists to leaves

cut_tree(X, target, mode_f=<function roc_auc_score>, choose_f=<built-in function max>)

Method of pruning tree. Find the best subtree that achieves the best value of the “mode_f” metric”.

Parameters

XPandas dataframe

Contain input features of events.

targetstr

Feature name for metric counting.

mode_ffunction, optional

Metric for selecting. The default is roc_auc_score.

choose_ffunction, optional

Type of best value (max or min). The default is max.

delete_leaves_by_span(list_span_leaf)

Set pre-termination nodes as leaves

get_spanning_leaf_numbers()

Get pre-termination nodes (have two leaves in edges)

predict(X, mode='target', target='time', end_list=[], bins=None)

Return values by mode & target

Parameters

XPandas dataframe

Contain input features of events.

modestr, optional

Mode of predicting. The default is “target”. “surv” : return values of survival function in bins “hazard” : return values of hazard function in bins “target” : return values of feature (in target variable) “rules” : return full rules from node to leaf

targetstr or list, optional

An aim of predicting. The default is occurred time.

end_listlist, optional

Numbers of endpoint nodes (for cutting)

binsarray-like, optional

Points of timeline

Returns

resarray-like

Values by mode & target

predict_at_times(X, bins, mode='surv')

Return survival or hazard function.

Parameters

XPandas dataframe

Contain input features of events.

binsarray-like

Points of timeline.

modestr, optional

Type of function. The default is “surv”. “surv” : send building function in nodes “hazard” : send building function in nodes

Returns

resarray-like

Vector of function values in times (bins)

translate(describe)

Rename features for each node by dictionary

visualize(path_dir=None, **kwargs)

Build graphviz representation of the tree

Parameters

path_dir : str kwargs : dict

Keyword arguments for nodes

Returns

dot : graphviz.Digraph