Get desktop application:
View/edit binary Protocol Buffers messages
An ArraysExtraInfo message stores a collection of additional Information about arrays in a model, complementing the information in the model itself. It is intentionally a separate message so that it may be serialized and passed separately from the model. See --arrays_extra_info_file. A typical use case is to manually specify MinMax for specific arrays in a model that does not already contain such MinMax information.
Used in:
Used in:
Next ID to use: 8.
Supported I/O file formats. Some formats may be input-only or output-only.
Used in:
GraphDef, third_party/tensorflow/core/framework/graph.proto
Tensorflow's mobile inference model. third_party/tensorflow/contrib/tflite/schema.fbs
GraphViz Export-only.
IODataType describes the numeric data types of input and output arrays of a model.
Used in:
, ,Float32, not quantized
Uint8, quantized
Int32, not quantized
Int64, not quantized
String, not quantized
Int16, quantized
Boolean
Complex64, not quantized
Int8, quantized based on QuantizationParameters in schema.
Half precision float, not quantized.
Next ID to USE: 7.
Used in:
Name of the input arrays, i.e. the arrays from which input activations will be read.
Shape of the input. For many applications the dimensions are {batch, height, width, depth}. Often the batch is left "unspecified" by providing a value of -1. The last dimension is typically called 'depth' or 'channels'. For example, for an image model taking RGB images as input, this would have the value 3.
mean_value and std_value parameters control the interpretation of raw input activation values (elements of the input array) as real numbers. The mapping is given by: real_value = (raw_input_value - mean_value) / std_value In particular, the defaults (mean_value=0, std_value=1) yield real_value = raw_input_value. Often, non-default values are used in image models. For example, an image model taking uint8 image channel values as its raw inputs, in [0, 255] range, may use mean_value=128, std_value=128 to map them into the interval [-1, 1). Note: this matches exactly the meaning of mean_value and std_value in (TensorFlow via LegacyFedInput).
Data type of the input. In many graphs, the input arrays already have defined data types, e.g. Placeholder nodes in a TensorFlow GraphDef have a dtype attribute. In those cases, it is not needed to specify this data_type flag. The purpose of this flag is only to define the data type of input arrays whose type isn't defined in the input graph file. For example, when specifying an arbitrary (not Placeholder) --input_array into a TensorFlow GraphDef. When this data_type is quantized (e.g. QUANTIZED_UINT8), the corresponding quantization parameters are the mean_value, std_value fields. It is also important to understand the nuance between this data_type flag and the inference_input_type in TocoFlags. The basic difference is that this data_type (like all ModelFlags) describes a property of the input graph, while inference_input_type (like all TocoFlags) describes an aspect of the toco transformation process and thus of the output file. The types of input arrays may be different between the input and output files if quantization or dequantization occurred. Such differences can only occur for real-number data i.e. only between FLOAT and quantized types (e.g. QUANTIZED_UINT8).
Used in:
,ModelFlags encodes properties of a model that, depending on the file format, may or may not be recorded in the model file. The purpose of representing these properties in ModelFlags is to allow passing them separately from the input model file, for instance as command-line parameters, so that we can offer a single uniform interface that can handle files from different input formats. For each of these properties, and each supported file format, we detail in comments below whether the property exists in the given file format. Obsolete flags that have been removed: optional int32 input_depth = 3; optional int32 input_width = 4; optional int32 input_height = 5; optional int32 batch = 6 [ default = 1]; optional float mean_value = 7; optional float std_value = 8 [default = 1.]; optional int32 input_dims = 11 [ default = 4]; repeated int32 input_shape = 13; Next ID to USE: 20.
Information about the input arrays, i.e. the arrays from which input activations will be read.
Name of the output arrays, i.e. the arrays into which output activations will be written.
If true, the model accepts an arbitrary batch size. Mutually exclusive with the 'batch' field: at most one of these two fields can be set.
If true, will allow passing inexistent arrays in --input_arrays and --output_arrays. This makes little sense, is only useful to more easily get graph visualizations.
If true, will allow passing non-ascii-printable characters in --input_arrays and --output_arrays. By default (if false), only ascii printable characters are allowed, i.e. character codes ranging from 32 to 127. This is disallowed by default so as to catch common copy-and-paste issues where invisible unicode characters are unwittingly added to these strings.
If set, this ArraysExtraInfo allows to pass extra information about arrays not specified in the input model file, such as extra MinMax information.
When set to false, toco will not change the input ranges and the output ranges of concat operator to the overlap of all input ranges.
Checks applied to the model, typically after toco's comprehensive graph transformations. Next ID to USE: 4.
Used in:
Use the name of a type of operator to check its counts. Use "Total" for overall operator counts. Use "Arrays" for overall array counts.
A count of zero is a meaningful check, so negative used to mean disable.
If count_max < count_min, then count_min is only allowed value.
Used in:
size allows to specify a 1-D shape for the RNN state array. Will be expanded with 1's to fit the model. TODO(benoitjacob): should allow a generic, explicit shape.
TocoFlags encodes extra parameters that drive tooling operations, that are not normally encoded in model files and in general may not be thought of as properties of models, instead describing how models are to be processed in the context of the present tooling job. Next ID to use: 31.
Input file format
Output file format
Similar to inference_type, but allows to control specifically the quantization of input arrays, separately from other arrays. If not set, then the value of inference_type is implicitly used, i.e. by default input arrays are quantized like other arrays. Like inference_type, this only affects real-number arrays. By "real-number" we mean float arrays, and quantized arrays. This excludes plain integer arrays, strings arrays, and every other data type. The typical use for this flag is for vision models taking a bitmap as input, typically with uint8 channels, yet still requiring floating-point inference. For such image models, the uint8 input is quantized, i.e. the uint8 values are interpreted as real numbers, and the quantization parameters used for such input arrays are their mean_value, std_value parameters.
Sets the type of real-number arrays in the output file, that is, controls the representation (quantization) of real numbers in the output file, except for input arrays, which are controlled by inference_input_type. NOTE: this flag only impacts real-number arrays. By "real-number" we mean float arrays, and quantized arrays. This excludes plain integer arrays, strings arrays, and every other data type. For real-number arrays, the impact of this flag is to allow the output file to choose a different real-numbers representation (quantization) from what the input file used. For any other types of arrays, changing the data type would not make sense. Specifically: - If FLOAT, then real-numbers arrays will be of type float in the output file. If they were quantized in the input file, then they get dequantized. - If QUANTIZED_UINT8, then real-numbers arrays will be quantized as uint8 in the output file. If they were float in the input file, then they get quantized. - If not set, then all real-numbers arrays retain the same type in the output file as they have in the input file.
default_ranges_min and default_ranges_max are helpers to experiment with quantization of models. Normally, quantization requires the input model to have (min, max) range information for every activations array. This is needed in order to know how to quantize arrays and still achieve satisfactory accuracy. However, in some circumstances one would just like to estimate the performance of quantized inference, without caring about accuracy. That is what default_ranges_min and default_ranges_max are for: when specified, they will be used as default (min, max) range boundaries for all activation arrays that lack (min, max) range information, thus allowing for quantization to proceed. It should be clear from the above explanation that these parameters are for experimentation purposes only and should not be used in production: they make it easy to quantize models, but the resulting quantized model will be inaccurate. These values only apply to arrays quantized with the kUint8 data type.
Equivalent versions of default_ranges_min/_max for arrays quantized with the kInt16 data type.
Ignore and discard FakeQuant nodes. For instance, that can be used to generate plain float code without fake-quantization from a quantized graph.
Normally, FakeQuant nodes must be strict boundaries for graph transformations, in order to ensure that quantized inference has the exact same arithmetic behavior as quantized training --- which is the whole point of quantized training and of FakeQuant nodes in the first place. However, that entails subtle requirements on where exactly FakeQuant nodes must be placed in the graph. Some quantized graphs have FakeQuant nodes at unexpected locations, that prevent graph transformations that are necessary in order to generate inference code for these graphs. Such graphs should be fixed, but as a temporary work-around, setting this reorder_across_fake_quant flag allows toco to perform necessary graph transformations on them, at the cost of no longer faithfully matching inference and training arithmetic.
If true, allow TOCO to create TF Lite Custom operators for all the unsupported Tensorflow ops.
Applies only to the case when the input format is TENSORFLOW_GRAPHDEF. If true, then control dependencies will be immediately dropped during import. If not set, the default behavior is as follows: - Default to false if the output format is TENSORFLOW_GRAPHDEF. - Default to true in all other cases.
Disables transformations that fuse subgraphs such as known LSTMs (not all LSTMs are identified).
Uses the FakeQuantWithMinMaxArgs.num_bits attribute to adjust quantized array data types throughout the graph. The graph must be properly annotated with FakeQuant* ops on at least the edges and may contain additional ops on the interior of the graph to widen/narrow as desired. Input and output array data types may change because of this propagation and users must be sure to query the final data_type values.
Some fast uint8 GEMM kernels require uint8 weights to avoid the value 0. This flag allows nudging them to 1 to allow proceeding, with moderate inaccuracy.
Minimum size of constant arrays to deduplicate; arrays smaller will not be deduplicated.
Split the LSTM inputs from 5 tensors to 18 tensors for TFLite. Ignored if the output format is not TFLite.
Store weights as quantized weights followed by dequantize operations. Computation is still done in float, but reduces model size (at the cost of accuracy and latency). DEPRECATED: Please use post_training_quantize instead.
Full filepath of folder to dump the graphs at various stages of processing GraphViz .dot files. Preferred over --output_format=GRAPHVIZ_DOT in order to keep the requirements of the output file.
Boolean indicating whether to dump the graph after every graph transformation.
Boolean indicating whether to quantize the weights of the converted float model. Model size will be reduced and there will be latency improvements (at the cost of accuracy).
This flag only works when converting to TensorFlow Lite format. When enabled, unsupported ops will be converted to select TensorFlow ops. TODO(ycling): Consider to rename the following 2 flags and don't call it "Flex". `enable_select_tf_ops` should always be used with `allow_custom_ops`. WARNING: Experimental interface, subject to change
This flag only works when converting to TensorFlow Lite format. When enabled, all TensorFlow ops will be converted to select TensorFlow ops. This will force `enable_select_tf_ops` to true. `force_select_tf_ops` should always be used with `enable_select_tf_ops`. WARNING: Experimental interface, subject to change
Boolean indicating whether to convert float32 constant buffers to float16. This is typically done to reduce model size. Delegates may also wish to implement kernels on reduced precision floats for performance gains.
Boolean flag indicating whether the converter should allow models with dynamic Tensor shape. When set to False, the converter will generate runtime memory offsets for activation Tensors (with 128 bits alignment) and error out on models with undetermined Tensor shape. (Default: True)