package tensorflow.tensorforest

Get desktop application:
View/edit binary Protocol Buffers messages

A parameter that may change with node depth.

Used in: SplitFinishConfig, SplitPruningConfig, TensorForestParams

oneof ParamType
- float constant_value = 1
- LinearParam linear = 2
- ExponentialParam exponential = 3
- ThresholdParam threshold = 4

A parameter that changes expoentially with the form f = c + mb^(k*d) where: c: constant bias b: base m: multiplier k: depth multiplier d: depth

Used in: DepthDependentParam

float bias = 1
float base = 2
float multiplier = 3
float depth_multiplier = 4

Used in: FertileStats

optional LeafStat leaf_stats = 4
The statistics for *all* the examples seen at this leaf.
repeated SplitCandidate candidates = 1
optional LeafStat post_init_leaf_stats = 6
The statistics for the examples seen at this leaf after all the splits have been initialized. If post_init_leaf_stats.weight_sum is > 0, then all candidates have been initialized. We need to track both leaf_stats and post_init_leaf_stats because the first is used to create the decision_tree::Leaf and the second is used to infer the statistics for the right side of a split (given the leaf side stats).
int32 node_id = 5
int32 depth = 7

repeated FertileSlot node_to_slot = 1
Tracks stats for each node. node_to_slot[i] is the FertileSlot for node i. This may be sized to max_nodes initially, or grow dynamically as needed.

Used in: LeafStat.GiniImpurityClassificationStats

float square = 2
This allows us to quickly track and calculate impurity (classification) by storing the sum of input weights and the sum of the squares of the input weights. Weighted gini is then: 1 - (square / sum * sum). Updates to these numbers are: old_i = leaf->value(label) new_i = old_i + incoming_weight sum -> sum + incoming_weight square -> square - (old_i ^ 2) + (new_i ^ 2) total_left_sum -> total_left_sum - old_left_i * old_total_i + new_left_i * new_total_i

Leaf models specify what is returned at inference time, and how it is stored in the decision_trees.Leaf protos.

Used in: TensorForestParams

MODEL_DENSE_CLASSIFICATION = 0
MODEL_SPARSE_CLASSIFICATION = 1
MODEL_REGRESSION = 2
MODEL_SPARSE_OR_DENSE_CLASSIFICATION = 3

Used in: FertileSlot, SplitCandidate

float weight_sum = 3
The sum of the weights of the training examples that we have seen. This is here, outside of the leaf_stat oneof, because almost all types will want it.
oneof leaf_stat
- LeafStat.GiniImpurityClassificationStats classification = 1
- LeafStat.LeastSquaresRegressionStats regression = 2
  TODO(thomaswc): Add in v5's SparseClassStats.

TODO(thomaswc): Move the GiniStats out of LeafStats and into something that only tracks them for splits.

Used in: LeafStat

oneof counts
- decision_trees.Vector dense_counts = 1
- decision_trees.SparseVector sparse_counts = 2
optional GiniStats gini = 3

This is the info needed for calculating variance for regression. Variance will still have to be summed over every output, but the number of outputs in regression problems is almost always 1.

Used in: LeafStat

optional decision_trees.Vector mean_output = 1
optional decision_trees.Vector mean_output_squares = 2

A parameter that changes linearly with depth, with upper and lower bounds.

Used in: DepthDependentParam

float slope = 1
float y_intercept = 2
float min_val = 3
float max_val = 4

Used in: FertileSlot

optional decision_trees.BinaryNode split = 1
proto representing the potential node.
optional LeafStat left_stats = 4
Right counts are inferred from FertileSlot.leaf_stats and left.
optional LeafStat right_stats = 5
Right stats (not full counts) are kept here.
string unique_id = 6
Fields used when training with a graph runner.

Allows selection of operations on the collection of split candidates. Basic infers right split stats from the leaf stats and each candidate's left stats.

Used in: TensorForestParams

COLLECTION_BASIC = 0
GRAPH_RUNNER_COLLECTION = 1

Used in: TensorForestParams

optional DepthDependentParam check_every_steps = 1
Configure how often we check for finish, because some finish methods are expensive to perform.
SplitFinishStrategyType type = 2

Finish strategies define when slots are considered finished. Basic requires at least split_after_samples, and doesn't allow slots to finish until the leaf has received more than one class. Hoeffding splits early after min_split_samples if one split is dominating the rest according to hoeffding bounds. Bootstrap does the same but compares gini's calculated with sampled smoothed counts.

Used in: SplitFinishConfig

SPLIT_FINISH_BASIC = 0
SPLIT_FINISH_DOMINATE_HOEFFDING = 2
SPLIT_FINISH_DOMINATE_BOOTSTRAP = 3

Used in: TensorForestParams

optional DepthDependentParam prune_every_samples = 1
SplitPruningStrategyType type = 2

Pruning strategies define how candidates are pruned over time. SPLIT_PRUNE_HALF prunes the worst half of splits every prune_ever_samples, etc. Note that prune_every_samples plays against the depth-dependent split_after_samples, so they should be set together.

Used in: SplitPruningConfig

SPLIT_PRUNE_NONE = 0
SPLIT_PRUNE_HALF = 1
SPLIT_PRUNE_QUARTER = 2
SPLIT_PRUNE_10_PERCENT = 3
SPLIT_PRUNE_HOEFFDING = 4
SPLIT_PRUNE_HOEFFDING prunes splits whose Gini impurity is worst than the best split's by more than the Hoeffding bound.

Stats models generally specify information that is collected which is necessary to choose a split at a node. Specifically, they operate on a SplitCandidate::LeafStat proto.

Used in: TensorForestParams

STATS_DENSE_GINI = 0
STATS_SPARSE_GINI = 1
STATS_LEAST_SQUARES_REGRESSION = 2
STATS_SPARSE_THEN_DENSE_GINI = 3
STATS_SPARSE_THEN_DENSE_GINI is deprecated and no longer supported.
STATS_FIXED_SIZE_SPARSE_GINI = 4

LeafModelType leaf_type = 1
------------ Types that control training subsystems ------ //
StatsModelType stats_type = 2
SplitCollectionType collection_type = 3
optional SplitPruningConfig pruning_type = 4
optional SplitFinishConfig finish_type = 5
int32 num_trees = 6
--------- Parameters that can't change by definition --------------- //
int32 max_nodes = 7
int32 num_features = 21
decision_trees.InequalityTest.Type inequality_test_type = 19
bool is_regression = 8
Some booleans controlling execution
bool drop_final_class = 9
bool collate_examples = 10
bool checkpoint_stats = 11
bool use_running_stats_method = 20
bool initialize_average_splits = 22
bool inference_tree_paths = 23
int32 num_outputs = 12
Number of classes (classification) or targets (regression)
optional DepthDependentParam num_splits_to_consider = 13
--------- Parameters that could be depth-dependent --------------- //
optional DepthDependentParam split_after_samples = 14
optional DepthDependentParam dominate_fraction = 15
optional DepthDependentParam min_split_samples = 18
string graph_dir = 16
--------- Parameters for experimental features ---------------------- //
int32 num_select_features = 17
int32 num_classes_to_track = 24
When using a FixedSizeSparseClassificationGrowStats, keep track of this many classes.

A parameter that is 'off' until depth >= a threshold, then is 'on'.

Used in: DepthDependentParam

float on_value = 1
float off_value = 2
float threshold = 3

Proto used for tracking tree paths during inference time.

repeated decision_trees.TreeNode nodes_visited = 1
Nodes are listed in order that they were traversed. i.e. nodes_visited[0] is the tree's root node.

package tensorflow.tensorforest

message DepthDependentParam

oneof ParamType

float constant_value = 1

LinearParam linear = 2

ExponentialParam exponential = 3

ThresholdParam threshold = 4

message ExponentialParam

float bias = 1

float base = 2

float multiplier = 3

float depth_multiplier = 4

message FertileSlot

optional LeafStat leaf_stats = 4

repeated SplitCandidate candidates = 1

optional LeafStat post_init_leaf_stats = 6

int32 node_id = 5

int32 depth = 7

message FertileStats

repeated FertileSlot node_to_slot = 1

message GiniStats

float square = 2

enum LeafModelType

MODEL_DENSE_CLASSIFICATION = 0

MODEL_SPARSE_CLASSIFICATION = 1

MODEL_REGRESSION = 2

MODEL_SPARSE_OR_DENSE_CLASSIFICATION = 3

message LeafStat

float weight_sum = 3

oneof leaf_stat

LeafStat.GiniImpurityClassificationStats classification = 1

LeafStat.LeastSquaresRegressionStats regression = 2

message LeafStat.GiniImpurityClassificationStats

oneof counts

decision_trees.Vector dense_counts = 1

decision_trees.SparseVector sparse_counts = 2

optional GiniStats gini = 3

message LeafStat.LeastSquaresRegressionStats

optional decision_trees.Vector mean_output = 1

optional decision_trees.Vector mean_output_squares = 2

message LinearParam

float slope = 1

float y_intercept = 2

float min_val = 3

float max_val = 4

message SplitCandidate

optional decision_trees.BinaryNode split = 1

optional LeafStat left_stats = 4

optional LeafStat right_stats = 5

string unique_id = 6

enum SplitCollectionType

COLLECTION_BASIC = 0

GRAPH_RUNNER_COLLECTION = 1

message SplitFinishConfig

optional DepthDependentParam check_every_steps = 1

SplitFinishStrategyType type = 2

enum SplitFinishStrategyType

SPLIT_FINISH_BASIC = 0

SPLIT_FINISH_DOMINATE_HOEFFDING = 2

SPLIT_FINISH_DOMINATE_BOOTSTRAP = 3

message SplitPruningConfig

optional DepthDependentParam prune_every_samples = 1

SplitPruningStrategyType type = 2

enum SplitPruningStrategyType

SPLIT_PRUNE_NONE = 0

SPLIT_PRUNE_HALF = 1

SPLIT_PRUNE_QUARTER = 2

SPLIT_PRUNE_10_PERCENT = 3

SPLIT_PRUNE_HOEFFDING = 4

enum StatsModelType

STATS_DENSE_GINI = 0

STATS_SPARSE_GINI = 1

STATS_LEAST_SQUARES_REGRESSION = 2

STATS_SPARSE_THEN_DENSE_GINI = 3

STATS_FIXED_SIZE_SPARSE_GINI = 4

message TensorForestParams

LeafModelType leaf_type = 1

StatsModelType stats_type = 2

SplitCollectionType collection_type = 3

optional SplitPruningConfig pruning_type = 4