package google.cloud.automl.v1

Mouse Melon logoGet desktop application:
View/edit binary Protocol Buffers messages

service AutoMl

service.proto:55

AutoML Server API. The resource names are assigned by the server. The server never reuses names that it has created after the resources with those names are deleted. An ID of a resource is the last element of the item's resource name. For `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`, then the id for the item is `{dataset_id}`. Currently the only supported `location_id` is "us-central1". On any input that is documented to expect a string parameter in snake_case or kebab-case, either of those cases is accepted.

service PredictionService

prediction_service.proto:40

AutoML Prediction API. On any input that is documented to expect a string parameter in snake_case or kebab-case, either of those cases is accepted.

message AnnotationPayload

annotation_payload.proto:36

Contains annotation information that is relevant to AutoML.

Used in: PredictResponse

message BatchPredictInputConfig

io.proto:614

Input configuration for BatchPredict Action. The format of input depends on the ML problem of the model used for prediction. As input source the [gcs_source][google.cloud.automl.v1.InputConfig.gcs_source] is expected, unless specified otherwise. The formats are represented in EBNF with commas being literal and with non-terminal symbols defined near the end of this comment. The formats are: <h4>AutoML Natural Language</h4> <div class="ds-selector-tabs"><section><h5>Classification</h5> One or more CSV files where each line is a single column: GCS_FILE_PATH `GCS_FILE_PATH` is the Google Cloud Storage location of a text file. Supported file extensions: .TXT, .PDF Text files can be no larger than 10MB in size. Sample rows: gs://folder/text1.txt gs://folder/text2.pdf </section><section><h5>Sentiment Analysis</h5> One or more CSV files where each line is a single column: GCS_FILE_PATH `GCS_FILE_PATH` is the Google Cloud Storage location of a text file. Supported file extensions: .TXT, .PDF Text files can be no larger than 128kB in size. Sample rows: gs://folder/text1.txt gs://folder/text2.pdf </section><section><h5>Entity Extraction</h5> One or more JSONL (JSON Lines) files that either provide inline text or documents. You can only use one format, either inline text or documents, for a single call to [AutoMl.BatchPredict]. Each JSONL file contains a per line a proto that wraps a temporary user-assigned TextSnippet ID (string up to 2000 characters long) called "id", a TextSnippet proto (in JSON representation) and zero or more TextFeature protos. Any given text snippet content must have 30,000 characters or less, and also be UTF-8 NFC encoded (ASCII already is). The IDs provided should be unique. Each document JSONL file contains, per line, a proto that wraps a Document proto with `input_config` set. Only PDF documents are currently supported, and each PDF document cannot exceed 2MB in size. Each JSONL file must not exceed 100MB in size, and no more than 20 JSONL files may be passed. Sample inline JSONL file (Shown with artificial line breaks. Actual line breaks are denoted by "\n".): { "id": "my_first_id", "text_snippet": { "content": "dog car cat"}, "text_features": [ { "text_segment": {"start_offset": 4, "end_offset": 6}, "structural_type": PARAGRAPH, "bounding_poly": { "normalized_vertices": [ {"x": 0.1, "y": 0.1}, {"x": 0.1, "y": 0.3}, {"x": 0.3, "y": 0.3}, {"x": 0.3, "y": 0.1}, ] }, } ], }\n { "id": "2", "text_snippet": { "content": "Extended sample content", "mime_type": "text/plain" } } Sample document JSONL file (Shown with artificial line breaks. Actual line breaks are denoted by "\n".): { "document": { "input_config": { "gcs_source": { "input_uris": [ "gs://folder/document1.pdf" ] } } } }\n { "document": { "input_config": { "gcs_source": { "input_uris": [ "gs://folder/document2.pdf" ] } } } } </section> </div> **Input field definitions:** `GCS_FILE_PATH` : The path to a file on Google Cloud Storage. For example, "gs://folder/video.avi". **Errors:** If any of the provided CSV files can't be parsed or if more than certain percent of CSV rows cannot be processed then the operation fails and prediction does not happen. Regardless of overall success or failure the per-row failures, up to a certain count cap, will be listed in Operation.metadata.partial_failures.

Used in: BatchPredictOperationMetadata, BatchPredictRequest

message BatchPredictOperationMetadata

operations.proto:137

Details of BatchPredict operation.

Used in: OperationMetadata

message BatchPredictOperationMetadata.BatchPredictOutputInfo

operations.proto:142

Further describes this batch predict's output. Supplements [BatchPredictOutputConfig][google.cloud.automl.v1.BatchPredictOutputConfig].

Used in: BatchPredictOperationMetadata

message BatchPredictOutputConfig

io.proto:816

Output configuration for BatchPredict Action. As destination the [gcs_destination][google.cloud.automl.v1.BatchPredictOutputConfig.gcs_destination] must be set unless specified otherwise for a domain. If gcs_destination is set then in the given directory a new directory is created. Its name will be "prediction-<model-display-name>-<timestamp-of-prediction-call>", where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. The contents of it depends on the ML problem the predictions are made for. * For Text Classification: In the created directory files `text_classification_1.jsonl`, `text_classification_2.jsonl`,...,`text_classification_N.jsonl` will be created, where N may be 1, and depends on the total number of inputs and annotations found. Each .JSONL file will contain, per line, a JSON representation of a proto that wraps input text (or pdf) file in the text snippet (or document) proto and a list of zero or more AnnotationPayload protos (called annotations), which have classification detail populated. A single text (or pdf) file will be listed only once with all its annotations, and its annotations will never be split across files. If prediction for any text (or pdf) file failed (partially or completely), then additional `errors_1.jsonl`, `errors_2.jsonl`,..., `errors_N.jsonl` files will be created (N depends on total number of failed predictions). These files will have a JSON representation of a proto that wraps input text (or pdf) file followed by exactly one [`google.rpc.Status`](https: //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) containing only `code` and `message`. * For Text Sentiment: In the created directory files `text_sentiment_1.jsonl`, `text_sentiment_2.jsonl`,...,`text_sentiment_N.jsonl` will be created, where N may be 1, and depends on the total number of inputs and annotations found. Each .JSONL file will contain, per line, a JSON representation of a proto that wraps input text (or pdf) file in the text snippet (or document) proto and a list of zero or more AnnotationPayload protos (called annotations), which have text_sentiment detail populated. A single text (or pdf) file will be listed only once with all its annotations, and its annotations will never be split across files. If prediction for any text (or pdf) file failed (partially or completely), then additional `errors_1.jsonl`, `errors_2.jsonl`,..., `errors_N.jsonl` files will be created (N depends on total number of failed predictions). These files will have a JSON representation of a proto that wraps input text (or pdf) file followed by exactly one [`google.rpc.Status`](https: //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) containing only `code` and `message`. * For Text Extraction: In the created directory files `text_extraction_1.jsonl`, `text_extraction_2.jsonl`,...,`text_extraction_N.jsonl` will be created, where N may be 1, and depends on the total number of inputs and annotations found. The contents of these .JSONL file(s) depend on whether the input used inline text, or documents. If input was inline, then each .JSONL file will contain, per line, a JSON representation of a proto that wraps given in request text snippet's "id" (if specified), followed by input text snippet, and a list of zero or more AnnotationPayload protos (called annotations), which have text_extraction detail populated. A single text snippet will be listed only once with all its annotations, and its annotations will never be split across files. If input used documents, then each .JSONL file will contain, per line, a JSON representation of a proto that wraps given in request document proto, followed by its OCR-ed representation in the form of a text snippet, finally followed by a list of zero or more AnnotationPayload protos (called annotations), which have text_extraction detail populated and refer, via their indices, to the OCR-ed text snippet. A single document (and its text snippet) will be listed only once with all its annotations, and its annotations will never be split across files. If prediction for any text snippet failed (partially or completely), then additional `errors_1.jsonl`, `errors_2.jsonl`,..., `errors_N.jsonl` files will be created (N depends on total number of failed predictions). These files will have a JSON representation of a proto that wraps either the "id" : "<id_value>" (in case of inline) or the document proto (in case of document) but here followed by exactly one [`google.rpc.Status`](https: //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) containing only `code` and `message`.

Used in: BatchPredictRequest

message BatchPredictResult

prediction_service.proto:199

Result of the Batch Predict. This message is returned in [response][google.longrunning.Operation.response] of the operation returned by the [PredictionService.BatchPredict][google.cloud.automl.v1.PredictionService.BatchPredict].

message BoundingBoxMetricsEntry

detection.proto:43

Bounding box matching model metrics for a single intersection-over-union threshold and multiple label match confidence thresholds.

Used in: ImageObjectDetectionEvaluationMetrics

message BoundingBoxMetricsEntry.ConfidenceMetricsEntry

detection.proto:45

Metrics for a single confidence threshold.

Used in: BoundingBoxMetricsEntry

message BoundingPoly

geometry.proto:45

A bounding polygon of a detected object on a plane. On output both vertices and normalized_vertices are provided. The polygon is formed by connecting vertices in the order they are listed.

Used in: Document.Layout, ImageObjectDetectionAnnotation

message ClassificationAnnotation

classification.proto:43

Contains annotation details specific to classification.

Used in: AnnotationPayload

message ClassificationEvaluationMetrics

classification.proto:53

Model evaluation metrics for classification problems.

Used in: ModelEvaluation

message ClassificationEvaluationMetrics.ConfidenceMetricsEntry

classification.proto:55

Metrics for a single confidence threshold.

Used in: ClassificationEvaluationMetrics

message ClassificationEvaluationMetrics.ConfusionMatrix

classification.proto:117

Confusion matrix of the model running the classification.

Used in: ClassificationEvaluationMetrics, TextSentimentEvaluationMetrics

message ClassificationEvaluationMetrics.ConfusionMatrix.Row

classification.proto:119

Output only. A row in the confusion matrix.

Used in: ConfusionMatrix

enum ClassificationType

classification.proto:31

Type of the classification problem.

Used in: ImageClassificationDatasetMetadata, TextClassificationDatasetMetadata, TextClassificationModelMetadata

message CreateDatasetOperationMetadata

operations.proto:104

Details of CreateDataset operation.

Used in: OperationMetadata

(message has no fields)

message CreateModelOperationMetadata

operations.proto:109

Details of CreateModel operation.

Used in: OperationMetadata

(message has no fields)

message Dataset

dataset.proto:36

A workspace for solving a single, particular machine learning (ML) problem. A workspace contains examples that may be annotated.

Used as response type in: AutoMl.GetDataset, AutoMl.UpdateDataset

Used as field type in: CreateDatasetRequest, ListDatasetsResponse, UpdateDatasetRequest

message DeleteOperationMetadata

operations.proto:89

Details of operations that perform deletes of any entities.

Used in: OperationMetadata

(message has no fields)

message DeployModelOperationMetadata

operations.proto:94

Details of DeployModel operation.

Used in: OperationMetadata

(message has no fields)

message Document

data_items.proto:99

A structured text document e.g. a PDF.

Used in: ExamplePayload

message Document.Layout

data_items.proto:101

Describes the layout information of a [text_segment][google.cloud.automl.v1.Document.Layout.text_segment] in the document.

Used in: Document

enum Document.Layout.TextSegmentType

data_items.proto:103

The type of TextSegment in the context of the original document.

Used in: Layout

message DocumentDimensions

data_items.proto:72

Message that describes dimension of a document.

Used in: Document

enum DocumentDimensions.DocumentDimensionUnit

data_items.proto:74

Unit of the document dimension.

Used in: DocumentDimensions

message DocumentInputConfig

io.proto:630

Input configuration of a [Document][google.cloud.automl.v1.Document].

Used in: Document

message ExamplePayload

data_items.proto:186

Example data used for training or prediction.

Used in: PredictRequest, PredictResponse

message ExportDataOperationMetadata

operations.proto:119

Details of ExportData operation.

Used in: OperationMetadata

message ExportDataOperationMetadata.ExportDataOutputInfo

operations.proto:123

Further describes this export data's output. Supplements [OutputConfig][google.cloud.automl.v1.OutputConfig].

Used in: ExportDataOperationMetadata

message ExportModelOperationMetadata

operations.proto:160

Details of ExportModel operation.

Used in: OperationMetadata

message ExportModelOperationMetadata.ExportModelOutputInfo

operations.proto:164

Further describes the output of model export. Supplements [ModelExportOutputConfig][google.cloud.automl.v1.ModelExportOutputConfig].

Used in: ExportModelOperationMetadata

message GcsDestination

io.proto:884

The Google Cloud Storage location where the output is to be written to.

Used in: BatchPredictOutputConfig, ModelExportOutputConfig, OutputConfig

message GcsSource

io.proto:876

The Google Cloud Storage location for the input content.

Used in: BatchPredictInputConfig, DocumentInputConfig, ImageInputConfig, InputConfig

message Image

data_items.proto:37

A representation of an image. Only images up to 30MB in size are supported.

Used in: ExamplePayload

message ImageClassificationDatasetMetadata

image.proto:35

Dataset metadata that is specific to image classification.

Used in: Dataset

message ImageClassificationModelDeploymentMetadata

image.proto:175

Model deployment metadata specific to Image Classification.

Used in: DeployModelRequest

message ImageClassificationModelMetadata

image.proto:44

Model metadata for image classification.

Used in: Model

message ImageInputConfig

io.proto:623

Input configuration of an [Document][google.cloud.automl.v1.Image].

Used in: Image

message ImageObjectDetectionAnnotation

detection.proto:32

Annotation details for image object detection.

Used in: AnnotationPayload

message ImageObjectDetectionDatasetMetadata

image.proto:41

Dataset metadata specific to image object detection.

Used in: Dataset

(message has no fields)

message ImageObjectDetectionEvaluationMetrics

detection.proto:74

Model evaluation metrics for image object detection problems. Evaluates prediction quality of labeled bounding boxes.

Used in: ModelEvaluation

message ImageObjectDetectionModelDeploymentMetadata

image.proto:186

Model deployment metadata specific to Image Object Detection.

Used in: DeployModelRequest

message ImageObjectDetectionModelMetadata

image.proto:127

Model metadata specific to image object detection.

Used in: Model

message ImportDataOperationMetadata

operations.proto:114

Details of ImportData operation.

Used in: OperationMetadata

(message has no fields)

message InputConfig

io.proto:472

Input configuration for [AutoMl.ImportData][google.cloud.automl.v1.AutoMl.ImportData] action. The format of input depends on dataset_metadata the Dataset into which the import is happening has. As input source the [gcs_source][google.cloud.automl.v1.InputConfig.gcs_source] is expected, unless specified otherwise. Additionally any input .CSV file by itself must be 100MB or smaller, unless specified otherwise. If an "example" file (that is, image, video etc.) with identical content (even if it had different `GCS_FILE_PATH`) is mentioned multiple times, then its label, bounding boxes etc. are appended. The same file should be always provided with the same `ML_USE` and `GCS_FILE_PATH`, if it is not, then these values are nondeterministically selected from the given ones. The formats are represented in EBNF with commas being literal and with non-terminal symbols defined near the end of this comment. The formats are: <h4>AutoML Vision</h4> <div class="ds-selector-tabs"><section><h5>Classification</h5> See [Preparing your training data](https://cloud.google.com/vision/automl/docs/prepare) for more information. CSV file(s) with each line in format: ML_USE,GCS_FILE_PATH,LABEL,LABEL,... * `ML_USE` - Identifies the data set that the current row (file) applies to. This value can be one of the following: * `TRAIN` - Rows in this file are used to train the model. * `TEST` - Rows in this file are used to test the model during training. * `UNASSIGNED` - Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing. * `GCS_FILE_PATH` - The Google Cloud Storage location of an image of up to 30MB in size. Supported extensions: .JPEG, .GIF, .PNG, .WEBP, .BMP, .TIFF, .ICO. * `LABEL` - A label that identifies the object in the image. For the `MULTICLASS` classification type, at most one `LABEL` is allowed per image. If an image has not yet been labeled, then it should be mentioned just once with no `LABEL`. Some sample rows: TRAIN,gs://folder/image1.jpg,daisy TEST,gs://folder/image2.jpg,dandelion,tulip,rose UNASSIGNED,gs://folder/image3.jpg,daisy UNASSIGNED,gs://folder/image4.jpg </section><section><h5>Object Detection</h5> See [Preparing your training data](https://cloud.google.com/vision/automl/object-detection/docs/prepare) for more information. A CSV file(s) with each line in format: ML_USE,GCS_FILE_PATH,[LABEL],(BOUNDING_BOX | ,,,,,,,) * `ML_USE` - Identifies the data set that the current row (file) applies to. This value can be one of the following: * `TRAIN` - Rows in this file are used to train the model. * `TEST` - Rows in this file are used to test the model during training. * `UNASSIGNED` - Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing. * `GCS_FILE_PATH` - The Google Cloud Storage location of an image of up to 30MB in size. Supported extensions: .JPEG, .GIF, .PNG. Each image is assumed to be exhaustively labeled. * `LABEL` - A label that identifies the object in the image specified by the `BOUNDING_BOX`. * `BOUNDING BOX` - The vertices of an object in the example image. The minimum allowed `BOUNDING_BOX` edge length is 0.01, and no more than 500 `BOUNDING_BOX` instances per image are allowed (one `BOUNDING_BOX` per line). If an image has no looked for objects then it should be mentioned just once with no LABEL and the ",,,,,,," in place of the `BOUNDING_BOX`. **Four sample rows:** TRAIN,gs://folder/image1.png,car,0.1,0.1,,,0.3,0.3,, TRAIN,gs://folder/image1.png,bike,.7,.6,,,.8,.9,, UNASSIGNED,gs://folder/im2.png,car,0.1,0.1,0.2,0.1,0.2,0.3,0.1,0.3 TEST,gs://folder/im3.png,,,,,,,,, </section> </div> <h4>AutoML Natural Language</h4> <div class="ds-selector-tabs"><section><h5>Entity Extraction</h5> See [Preparing your training data](/natural-language/automl/entity-analysis/docs/prepare) for more information. One or more CSV file(s) with each line in the following format: ML_USE,GCS_FILE_PATH * `ML_USE` - Identifies the data set that the current row (file) applies to. This value can be one of the following: * `TRAIN` - Rows in this file are used to train the model. * `TEST` - Rows in this file are used to test the model during training. * `UNASSIGNED` - Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing.. * `GCS_FILE_PATH` - a Identifies JSON Lines (.JSONL) file stored in Google Cloud Storage that contains in-line text in-line as documents for model training. After the training data set has been determined from the `TRAIN` and `UNASSIGNED` CSV files, the training data is divided into train and validation data sets. 70% for training and 30% for validation. For example: TRAIN,gs://folder/file1.jsonl VALIDATE,gs://folder/file2.jsonl TEST,gs://folder/file3.jsonl **In-line JSONL files** In-line .JSONL files contain, per line, a JSON document that wraps a [`text_snippet`][google.cloud.automl.v1.TextSnippet] field followed by one or more [`annotations`][google.cloud.automl.v1.AnnotationPayload] fields, which have `display_name` and `text_extraction` fields to describe the entity from the text snippet. Multiple JSON documents can be separated using line breaks (\n). The supplied text must be annotated exhaustively. For example, if you include the text "horse", but do not label it as "animal", then "horse" is assumed to not be an "animal". Any given text snippet content must have 30,000 characters or less, and also be UTF-8 NFC encoded. ASCII is accepted as it is UTF-8 NFC encoded. For example: { "text_snippet": { "content": "dog car cat" }, "annotations": [ { "display_name": "animal", "text_extraction": { "text_segment": {"start_offset": 0, "end_offset": 2} } }, { "display_name": "vehicle", "text_extraction": { "text_segment": {"start_offset": 4, "end_offset": 6} } }, { "display_name": "animal", "text_extraction": { "text_segment": {"start_offset": 8, "end_offset": 10} } } ] }\n { "text_snippet": { "content": "This dog is good." }, "annotations": [ { "display_name": "animal", "text_extraction": { "text_segment": {"start_offset": 5, "end_offset": 7} } } ] } **JSONL files that reference documents** .JSONL files contain, per line, a JSON document that wraps a `input_config` that contains the path to a source PDF document. Multiple JSON documents can be separated using line breaks (\n). For example: { "document": { "input_config": { "gcs_source": { "input_uris": [ "gs://folder/document1.pdf" ] } } } }\n { "document": { "input_config": { "gcs_source": { "input_uris": [ "gs://folder/document2.pdf" ] } } } } **In-line JSONL files with PDF layout information** **Note:** You can only annotate PDF files using the UI. The format described below applies to annotated PDF files exported using the UI or `exportData`. In-line .JSONL files for PDF documents contain, per line, a JSON document that wraps a `document` field that provides the textual content of the PDF document and the layout information. For example: { "document": { "document_text": { "content": "dog car cat" } "layout": [ { "text_segment": { "start_offset": 0, "end_offset": 11, }, "page_number": 1, "bounding_poly": { "normalized_vertices": [ {"x": 0.1, "y": 0.1}, {"x": 0.1, "y": 0.3}, {"x": 0.3, "y": 0.3}, {"x": 0.3, "y": 0.1}, ], }, "text_segment_type": TOKEN, } ], "document_dimensions": { "width": 8.27, "height": 11.69, "unit": INCH, } "page_count": 3, }, "annotations": [ { "display_name": "animal", "text_extraction": { "text_segment": {"start_offset": 0, "end_offset": 3} } }, { "display_name": "vehicle", "text_extraction": { "text_segment": {"start_offset": 4, "end_offset": 7} } }, { "display_name": "animal", "text_extraction": { "text_segment": {"start_offset": 8, "end_offset": 11} } }, ], </section><section><h5>Classification</h5> See [Preparing your training data](https://cloud.google.com/natural-language/automl/docs/prepare) for more information. One or more CSV file(s) with each line in the following format: ML_USE,(TEXT_SNIPPET | GCS_FILE_PATH),LABEL,LABEL,... * `ML_USE` - Identifies the data set that the current row (file) applies to. This value can be one of the following: * `TRAIN` - Rows in this file are used to train the model. * `TEST` - Rows in this file are used to test the model during training. * `UNASSIGNED` - Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing. * `TEXT_SNIPPET` and `GCS_FILE_PATH` are distinguished by a pattern. If the column content is a valid Google Cloud Storage file path, that is, prefixed by "gs://", it is treated as a `GCS_FILE_PATH`. Otherwise, if the content is enclosed in double quotes (""), it is treated as a `TEXT_SNIPPET`. For `GCS_FILE_PATH`, the path must lead to a file with supported extension and UTF-8 encoding, for example, "gs://folder/content.txt" AutoML imports the file content as a text snippet. For `TEXT_SNIPPET`, AutoML imports the column content excluding quotes. In both cases, size of the content must be 10MB or less in size. For zip files, the size of each file inside the zip must be 10MB or less in size. For the `MULTICLASS` classification type, at most one `LABEL` is allowed. The `ML_USE` and `LABEL` columns are optional. Supported file extensions: .TXT, .PDF, .ZIP A maximum of 100 unique labels are allowed per CSV row. Sample rows: TRAIN,"They have bad food and very rude",RudeService,BadFood gs://folder/content.txt,SlowService TEST,gs://folder/document.pdf VALIDATE,gs://folder/text_files.zip,BadFood </section><section><h5>Sentiment Analysis</h5> See [Preparing your training data](https://cloud.google.com/natural-language/automl/docs/prepare) for more information. CSV file(s) with each line in format: ML_USE,(TEXT_SNIPPET | GCS_FILE_PATH),SENTIMENT * `ML_USE` - Identifies the data set that the current row (file) applies to. This value can be one of the following: * `TRAIN` - Rows in this file are used to train the model. * `TEST` - Rows in this file are used to test the model during training. * `UNASSIGNED` - Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing. * `TEXT_SNIPPET` and `GCS_FILE_PATH` are distinguished by a pattern. If the column content is a valid Google Cloud Storage file path, that is, prefixed by "gs://", it is treated as a `GCS_FILE_PATH`. Otherwise, if the content is enclosed in double quotes (""), it is treated as a `TEXT_SNIPPET`. For `GCS_FILE_PATH`, the path must lead to a file with supported extension and UTF-8 encoding, for example, "gs://folder/content.txt" AutoML imports the file content as a text snippet. For `TEXT_SNIPPET`, AutoML imports the column content excluding quotes. In both cases, size of the content must be 128kB or less in size. For zip files, the size of each file inside the zip must be 128kB or less in size. The `ML_USE` and `SENTIMENT` columns are optional. Supported file extensions: .TXT, .PDF, .ZIP * `SENTIMENT` - An integer between 0 and Dataset.text_sentiment_dataset_metadata.sentiment_max (inclusive). Describes the ordinal of the sentiment - higher value means a more positive sentiment. All the values are completely relative, i.e. neither 0 needs to mean a negative or neutral sentiment nor sentiment_max needs to mean a positive one - it is just required that 0 is the least positive sentiment in the data, and sentiment_max is the most positive one. The SENTIMENT shouldn't be confused with "score" or "magnitude" from the previous Natural Language Sentiment Analysis API. All SENTIMENT values between 0 and sentiment_max must be represented in the imported data. On prediction the same 0 to sentiment_max range will be used. The difference between neighboring sentiment values needs not to be uniform, e.g. 1 and 2 may be similar whereas the difference between 2 and 3 may be large. Sample rows: TRAIN,"@freewrytin this is way too good for your product",2 gs://folder/content.txt,3 TEST,gs://folder/document.pdf VALIDATE,gs://folder/text_files.zip,2 </section> </div> **Input field definitions:** `ML_USE` : ("TRAIN" | "VALIDATE" | "TEST" | "UNASSIGNED") Describes how the given example (file) should be used for model training. "UNASSIGNED" can be used when user has no preference. `GCS_FILE_PATH` : The path to a file on Google Cloud Storage. For example, "gs://folder/image1.png". `LABEL` : A display name of an object on an image, video etc., e.g. "dog". Must be up to 32 characters long and can consist only of ASCII Latin letters A-Z and a-z, underscores(_), and ASCII digits 0-9. For each label an AnnotationSpec is created which display_name becomes the label; AnnotationSpecs are given back in predictions. `BOUNDING_BOX` : (`VERTEX,VERTEX,VERTEX,VERTEX` | `VERTEX,,,VERTEX,,`) A rectangle parallel to the frame of the example (image, video). If 4 vertices are given they are connected by edges in the order provided, if 2 are given they are recognized as diagonally opposite vertices of the rectangle. `VERTEX` : (`COORDINATE,COORDINATE`) First coordinate is horizontal (x), the second is vertical (y). `COORDINATE` : A float in 0 to 1 range, relative to total length of image or video in given dimension. For fractions the leading non-decimal 0 can be omitted (i.e. 0.3 = .3). Point 0,0 is in top left. `TEXT_SNIPPET` : The content of a text snippet, UTF-8 encoded, enclosed within double quotes (""). `DOCUMENT` : A field that provides the textual content with document and the layout information. **Errors:** If any of the provided CSV files can't be parsed or if more than certain percent of CSV rows cannot be processed then the operation fails and nothing is imported. Regardless of overall success or failure the per-row failures, up to a certain count cap, is listed in Operation.metadata.partial_failures.

Used in: ImportDataRequest

message Model

model.proto:35

API proto representing a trained machine learning model.

Used as response type in: AutoMl.GetModel, AutoMl.UpdateModel

Used as field type in: CreateModelRequest, ListModelsResponse, UpdateModelRequest

enum Model.DeploymentState

model.proto:42

Deployment state of the model.

Used in: Model

message ModelEvaluation

model_evaluation.proto:37

Evaluation results of a model.

Used as response type in: AutoMl.GetModelEvaluation

Used as field type in: ListModelEvaluationsResponse

message ModelExportOutputConfig

io.proto:826

Output configuration for ModelExport Action.

Used in: ExportModelRequest

message NormalizedVertex

geometry.proto:34

A vertex represents a 2D point in the image. The normalized vertex coordinates are between 0 to 1 fractions relative to the original plane (image, video). E.g. if the plane (e.g. whole image) would have size 10 x 20 then a point with normalized coordinates (0.1, 0.3) would be at the position (1, 6) on that plane.

Used in: BoundingPoly

message OperationMetadata

operations.proto:39

Metadata used across all long running operations returned by AutoML API.

message OutputConfig

io.proto:710

Output configuration for ExportData. As destination the [gcs_destination][google.cloud.automl.v1.OutputConfig.gcs_destination] must be set unless specified otherwise for a domain. If gcs_destination is set then in the given directory a new directory is created. Its name will be "export_data-<dataset-display-name>-<timestamp-of-export-call>", where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. Only ground truth annotations are exported (not approved annotations are not exported). The outputs correspond to how the data was imported, and may be used as input to import data. The output formats are represented as EBNF with literal commas and same non-terminal symbols definitions are these in import data's [InputConfig][google.cloud.automl.v1.InputConfig]: * For Image Classification: CSV file(s) `image_classification_1.csv`, `image_classification_2.csv`,...,`image_classification_N.csv`with each line in format: ML_USE,GCS_FILE_PATH,LABEL,LABEL,... where GCS_FILE_PATHs point at the original, source locations of the imported images. For MULTICLASS classification type, there can be at most one LABEL per example. * For Image Object Detection: CSV file(s) `image_object_detection_1.csv`, `image_object_detection_2.csv`,...,`image_object_detection_N.csv` with each line in format: ML_USE,GCS_FILE_PATH,[LABEL],(BOUNDING_BOX | ,,,,,,,) where GCS_FILE_PATHs point at the original, source locations of the imported images. * For Text Classification: In the created directory CSV file(s) `text_classification_1.csv`, `text_classification_2.csv`, ...,`text_classification_N.csv` will be created where N depends on the total number of examples exported. Each line in the CSV is of the format: ML_USE,GCS_FILE_PATH,LABEL,LABEL,... where GCS_FILE_PATHs point at the exported .txt files containing the text content of the imported example. For MULTICLASS classification type, there will be at most one LABEL per example. * For Text Sentiment: In the created directory CSV file(s) `text_sentiment_1.csv`, `text_sentiment_2.csv`, ...,`text_sentiment_N.csv` will be created where N depends on the total number of examples exported. Each line in the CSV is of the format: ML_USE,GCS_FILE_PATH,SENTIMENT where GCS_FILE_PATHs point at the exported .txt files containing the text content of the imported example. * For Text Extraction: CSV file `text_extraction.csv`, with each line in format: ML_USE,GCS_FILE_PATH GCS_FILE_PATH leads to a .JSONL (i.e. JSON Lines) file which contains, per line, a proto that wraps a TextSnippet proto (in json representation) followed by AnnotationPayload protos (called annotations). If initially documents had been imported, the JSONL will point at the original, source locations of the imported documents. * For Translation: CSV file `translation.csv`, with each line in format: ML_USE,GCS_FILE_PATH GCS_FILE_PATH leads to a .TSV file which describes examples that have given ML_USE, using the following row format per line: TEXT_SNIPPET (in source language) \t TEXT_SNIPPET (in target language)

Used in: ExportDataRequest

message TextClassificationDatasetMetadata

text.proto:32

Dataset metadata for classification.

Used in: Dataset

message TextClassificationModelMetadata

text.proto:38

Model metadata that is specific to text classification.

Used in: Model

message TextExtractionAnnotation

text_extraction.proto:31

Annotation for identifying spans of text.

Used in: AnnotationPayload

message TextExtractionDatasetMetadata

text.proto:44

Dataset metadata that is specific to text extraction

Used in: Dataset

(message has no fields)

message TextExtractionEvaluationMetrics

text_extraction.proto:46

Model evaluation metrics for text extraction problems.

Used in: ModelEvaluation

message TextExtractionEvaluationMetrics.ConfidenceMetricsEntry

text_extraction.proto:48

Metrics for a single confidence threshold.

Used in: TextExtractionEvaluationMetrics

message TextExtractionModelMetadata

text.proto:47

Model metadata that is specific to text extraction.

Used in: Model

(message has no fields)

message TextSegment

text_segment.proto:31

A contiguous part of a text (string), assuming it has an UTF-8 NFC encoding.

Used in: Document.Layout, TextExtractionAnnotation

message TextSentimentAnnotation

text_sentiment.proto:32

Contains annotation details specific to text sentiment.

Used in: AnnotationPayload

message TextSentimentDatasetMetadata

text.proto:50

Dataset metadata for text sentiment.

Used in: Dataset

message TextSentimentEvaluationMetrics

text_sentiment.proto:49

Model evaluation metrics for text sentiment problems.

Used in: ModelEvaluation

message TextSentimentModelMetadata

text.proto:60

Model metadata that is specific to text sentiment.

Used in: Model

(message has no fields)

message TextSnippet

data_items.proto:57

A representation of a text snippet.

Used in: Document, ExamplePayload, TranslationAnnotation

message TranslationAnnotation

translation.proto:68

Annotation details specific to translation.

Used in: AnnotationPayload

message TranslationDatasetMetadata

translation.proto:33

Dataset metadata that is specific to translation.

Used in: Dataset

message TranslationEvaluationMetrics

translation.proto:42

Evaluation metrics for the dataset.

Used in: ModelEvaluation

message TranslationModelMetadata

translation.proto:51

Model metadata that is specific to translation.

Used in: Model

message UndeployModelOperationMetadata

operations.proto:99

Details of UndeployModel operation.

Used in: OperationMetadata

(message has no fields)