Tree
- class survivors.tree.node.Node(df, numb=0, full_rule=[], depth=0, features=[], categ=[], woe=False, verbose=0, **info)
Node of decision tree. Allow to separate data into 2 child nodes
Attributes
- dfPandas DataFrame
Data of the Node
- numbint
Number or name of the Node
- full_rulelist
Rules from the root
- depthint
Distance from the root
- edgesarray-like
Numbers of child nodes
- rule_edgesarray-like
Rules for child nodes
- featureslist
Available features
- categlist
Names of categorical features
- woeboolean
Mode of categorical preparation
- is_leafboolean
True if node is terminal (there are no child nodes)
- verboseint
Print the best split of the node
- infodict
Parameters for finding the best splitting
- leaf_modelLeafModel
Stratified model by parameter <<leaf_model>> (map with LEAF_MODEL_DICT)
Methods
check_params : Fill empty parameters and map max_features to int find_best_split : Choose best split of node according to parameters split : Find best split of df sample and create child nodes set_edges: Set number of child nodes from hash table of main tree set_leaf : Delete child nodes and set node as terminal
predict : Return statistic values of a data predict_scheme : Return all possible outcomes for additional features determination
prepare_df_for_attr : set input values to numpy format (and fill missing features) get_edges : defines appropriate child nodes according to input values get_full_rule : convert full_rules to a string format
get_figure : Create picture of a data (hist, survival function) get_description : Return common values of a data (size, depth, death, cens)
set_dot_node : add self node to the graphviz dot set_dot_edges : add child nodes to the graphviz dot translate : Replace rules and features by dictionary
- find_best_split()
Sort through all combinations of splitting the sample by features. Find a split with the highest statistical value.
- get_comb_fast(features)
Create set of all triplets with two target variables and one splitting feature
- get_description(lang='en')
Build description for graphviz node
Returns
- labelstr
multiline string of description
- get_figure(mode='hist', bins=None, target='cens', save_path='', lang='en')
Save background image for graphviz node
Parameters
- targetstr
Described feature
- modestring
“hist” plot histogram of target
“kde” plot smooth density of target
“surv” plot survival function (doesn’t use target)
- binslist
Points for survival function
- save_pathstr
External path to save image
- ind_for_nodes(X_attr, best_split, is_categ)
Map the number of the according child node by rule and sample features
- predict(X, target, bins=None)
Predict target values for X data
Parameters
- XPandas dataframe
Contain input features of events
- targetstr or function
Column name, mode or aggregate function of a leaf sample Column name : must be in dataset.columns - Return mean of feature Mode : - “surv” return survival function - “hazard” return cumulative hazard function - attribute_name return attribute value (e.g. depth, numb) - feature_name return aggregate statistic value for node
- binsarray-like
Points of timeline
Returns
- resarray-like
Values by target
- predict_scheme(X, scheme_feats)
Predict target values for X data
Parameters
- XPandas dataframe
Contain input features of events
- scheme_featslist
Needed features from node
Returns
- resScheme
Aggregation entity of node information
- set_dot_edges(dot)
Set edges for graphviz by node structure
- set_dot_node(dot, path_dir='', depth=None, lang='en', **args)
Set node for graphviz dot with image and label
- split()
Find best split of df sample and create child nodes
- translate(describe)
Rename features in node by dictionary
- class survivors.tree.node.Rule(feature: str, condition: str, has_nan: int)
Node of decision tree. Allow to separate data into 2 child nodes
Attributes
- featurestr
Name of feature for splitting
- conditionstr
Operation for splitting
- has_nanbool
Flag of the missing values in node
Methods
get_feature : Return feature get_condition : Return condition translate: Replace rule by dictionary to_str : Transforming to linear form print : Print all attributes and descriptions
- print()
Print all attributes and descriptions
- translate(describe: dict)
Rename feature in rule
- class survivors.tree.decision_tree.CRAID(depth=0, random_state=123, features=[], categ=[], cut=False, balance=None, **info)
Survival decision tree model.
Attributes
- nodesdict
Dictionary of all tree’s nodes (numbers from hierarchy)
- cutboolean
Flag of pruning
- balanceboolean
Flag of source data balancing
- depthint
Maximal depth of nodes
- featureslist
Available features
- categlist
Names of categorical features
- random_stateint
Fixed seed for building reproducibility
- namestr
Model’s name
- binsarray-like
Points of timeline.
- infodict
Parameters for building nodes
Methods
fit : build decision tree with X, y data (iterative splitting node) predict : return values of features, rules or schemes predict_at_times : return survival or hazard function predict_schemes : return FilledSchemeStrategy or Scheme cut_tree : pruning function
visualize : build graphviz Digraph for each node translate : Replace rules and features by dictionary
get_leaf_numbers : return leaf numbers from nodes get_spanning_leaf_numbers : return pre-leaves numbers from nodes delete_leaves_by_span : set up pre-leaves from lists to leaves
- cut_tree(X, target, mode_f=<function roc_auc_score>, choose_f=<built-in function max>)
Method of pruning tree. Find the best subtree that achieves the best value of the “mode_f” metric”.
Parameters
- XPandas dataframe
Contain input features of events.
- targetstr
Feature name for metric counting.
- mode_ffunction, optional
Metric for selecting. The default is roc_auc_score.
- choose_ffunction, optional
Type of best value (max or min). The default is max.
- delete_leaves_by_span(list_span_leaf)
Set pre-termination nodes as leaves
- get_spanning_leaf_numbers()
Get pre-termination nodes (have two leaves in edges)
- predict(X, mode='target', target='time', end_list=[], bins=None)
Return values by mode & target
Parameters
- XPandas dataframe
Contain input features of events.
- modestr, optional
Mode of predicting. The default is “target”. “surv” : return values of survival function in bins “hazard” : return values of hazard function in bins “target” : return values of feature (in target variable) “rules” : return full rules from node to leaf
- targetstr or list, optional
An aim of predicting. The default is occurred time.
- end_listlist, optional
Numbers of endpoint nodes (for cutting)
- binsarray-like, optional
Points of timeline
Returns
- resarray-like
Values by mode & target
- predict_at_times(X, bins, mode='surv')
Return survival or hazard function.
Parameters
- XPandas dataframe
Contain input features of events.
- binsarray-like
Points of timeline.
- modestr, optional
Type of function. The default is “surv”. “surv” : send building function in nodes “hazard” : send building function in nodes
Returns
- resarray-like
Vector of function values in times (bins)
- translate(describe)
Rename features for each node by dictionary