Get desktop application:
View/edit binary Protocol Buffers messages
A parameter that may change with node depth.
Used in:
, ,A parameter that changes expoentially with the form f = c + mb^(k*d) where: c: constant bias b: base m: multiplier k: depth multiplier d: depth
Used in:
Used in:
The statistics for *all* the examples seen at this leaf.
The statistics for the examples seen at this leaf after all the splits have been initialized. If post_init_leaf_stats.weight_sum is > 0, then all candidates have been initialized. We need to track both leaf_stats and post_init_leaf_stats because the first is used to create the decision_tree::Leaf and the second is used to infer the statistics for the right side of a split (given the leaf side stats).
Tracks stats for each node. node_to_slot[i] is the FertileSlot for node i. This may be sized to max_nodes initially, or grow dynamically as needed.
Used in:
This allows us to quickly track and calculate impurity (classification) by storing the sum of input weights and the sum of the squares of the input weights. Weighted gini is then: 1 - (square / sum * sum). Updates to these numbers are: old_i = leaf->value(label) new_i = old_i + incoming_weight sum -> sum + incoming_weight square -> square - (old_i ^ 2) + (new_i ^ 2) total_left_sum -> total_left_sum - old_left_i * old_total_i + new_left_i * new_total_i
Leaf models specify what is returned at inference time, and how it is stored in the decision_trees.Leaf protos.
Used in:
Used in:
,The sum of the weights of the training examples that we have seen. This is here, outside of the leaf_stat oneof, because almost all types will want it.
TODO(thomaswc): Add in v5's SparseClassStats.
TODO(thomaswc): Move the GiniStats out of LeafStats and into something that only tracks them for splits.
Used in:
This is the info needed for calculating variance for regression. Variance will still have to be summed over every output, but the number of outputs in regression problems is almost always 1.
Used in:
A parameter that changes linearly with depth, with upper and lower bounds.
Used in:
Used in:
proto representing the potential node.
Right counts are inferred from FertileSlot.leaf_stats and left.
Right stats (not full counts) are kept here.
Fields used when training with a graph runner.
Allows selection of operations on the collection of split candidates. Basic infers right split stats from the leaf stats and each candidate's left stats.
Used in:
Used in:
Configure how often we check for finish, because some finish methods are expensive to perform.
Finish strategies define when slots are considered finished. Basic requires at least split_after_samples, and doesn't allow slots to finish until the leaf has received more than one class. Hoeffding splits early after min_split_samples if one split is dominating the rest according to hoeffding bounds. Bootstrap does the same but compares gini's calculated with sampled smoothed counts.
Used in:
Used in:
Pruning strategies define how candidates are pruned over time. SPLIT_PRUNE_HALF prunes the worst half of splits every prune_ever_samples, etc. Note that prune_every_samples plays against the depth-dependent split_after_samples, so they should be set together.
Used in:
SPLIT_PRUNE_HOEFFDING prunes splits whose Gini impurity is worst than the best split's by more than the Hoeffding bound.
Stats models generally specify information that is collected which is necessary to choose a split at a node. Specifically, they operate on a SplitCandidate::LeafStat proto.
Used in:
STATS_SPARSE_THEN_DENSE_GINI is deprecated and no longer supported.
------------ Types that control training subsystems ------ //
--------- Parameters that can't change by definition --------------- //
Some booleans controlling execution
Number of classes (classification) or targets (regression)
--------- Parameters that could be depth-dependent --------------- //
--------- Parameters for experimental features ---------------------- //
When using a FixedSizeSparseClassificationGrowStats, keep track of this many classes.
A parameter that is 'off' until depth >= a threshold, then is 'on'.
Used in:
Proto used for tracking tree paths during inference time.
Nodes are listed in order that they were traversed. i.e. nodes_visited[0] is the tree's root node.