package google.cloud.documentai.v1beta1

Get desktop application:
View/edit binary Protocol Buffers messages

Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.

rpc BatchProcessDocuments (BatchProcessDocumentsRequest, longrunning.Operation)
document_understanding.proto:41
LRO endpoint to batch process many documents.
message BatchProcessDocumentsRequest
document_understanding.proto:60
Request to batch process documents as an asynchronous operation.
- repeated ProcessDocumentRequest requests = 1
  Required. Individual requests for each document.
- string parent = 2
  Target project and location to make a call. Format: `projects/{project-id}/locations/{location-id}`. If no location is specified, a region will be chosen automatically.

Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.

repeated ProcessDocumentResponse responses = 1
Responses for each individual document.

A bounding polygon for the detected image annotation.

Used in: Document.Page.Layout, TableBoundHint

repeated Vertex vertices = 1
The bounding polygon vertices.
repeated NormalizedVertex normalized_vertices = 2
The bounding polygon normalized vertices.

Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.

oneof source
Original source document from the user.
- string uri = 1
  Currently supports Google Cloud Storage URI of the form `gs://bucket_name/object_name`. Object versioning is not supported. See [Google Cloud Storage Request URIs](https://cloud.google.com/storage/docs/reference-uris) for more info.
- bytes content = 2
  Inline document content, represented as a stream of bytes. Note: As with all `bytes` fields, protobuffers use a pure binary representation, whereas JSON representations use base64.
string mime_type = 3
An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media-types/media-types.xhtml.
string text = 4
UTF-8 encoded text in reading order from the document.
repeated Document.Style text_styles = 5
Styles for the [Document.text][google.cloud.documentai.v1beta1.Document.text].
repeated Document.Page pages = 6
Visual page layout for the [Document][google.cloud.documentai.v1beta1.Document].
repeated Document.Entity entities = 7
A list of entities detected on [Document.text][google.cloud.documentai.v1beta1.Document.text]. For document shards, entities in this list may cross shard boundaries.
repeated Document.EntityRelation entity_relations = 8
Relationship among [Document.entities][google.cloud.documentai.v1beta1.Document.entities].
optional Document.ShardInfo shard_info = 9
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.
optional rpc.Status error = 10
Any error that occurred while processing this document.

A phrase in the text that is a known entity type, such as a person, an organization, or location.

Used in: Document

optional TextAnchor text_anchor = 1
Provenance of the entity. Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta1.Document.text].
string type = 2
Entity type from a schema e.g. `Address`.
string mention_text = 3
Text value in the document e.g. `1600 Amphitheatre Pkwy`.
string mention_id = 4
Canonical mention name. This will be a unique value in the entity list for this document.

Relationship between [Entities][google.cloud.documentai.v1beta1.Document.Entity].

Used in: Document

string subject_id = 1
Subject entity mention_id.
string object_id = 2
Object entity mention_id.
string relation = 3
Relationship description.

A page in a [Document][google.cloud.documentai.v1beta1.Document].

Used in: Document

int32 page_number = 1
1-based index for current [Page][google.cloud.documentai.v1beta1.Document.Page] in a parent [Document][google.cloud.documentai.v1beta1.Document]. Useful when a page is taken out of a [Document][google.cloud.documentai.v1beta1.Document] for individual processing.
optional Page.Dimension dimension = 2
Physical dimension of the page.
optional Page.Layout layout = 3
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for the page.
repeated Page.DetectedLanguage detected_languages = 4
A list of detected languages together with confidence.
repeated Page.Block blocks = 5
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
repeated Page.Paragraph paragraphs = 6
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.
repeated Page.Line lines = 7
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.
repeated Page.Token tokens = 8
A list of visually detected tokens on the page.
repeated Page.VisualElement visual_elements = 9
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.
repeated Page.Table tables = 10
A list of visually detected tables on the page.
repeated Page.FormField form_fields = 11
A list of visually detected form fields on the page.

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

Used in: Page

optional Layout layout = 1
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Block][google.cloud.documentai.v1beta1.Document.Page.Block].
repeated DetectedLanguage detected_languages = 2
A list of detected languages together with confidence.

Detected language for a structural component.

Used in: Page, Block, FormField, Line, Paragraph, Table, Table.TableCell, Token, VisualElement

string language_code = 1
The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.
float confidence = 2
Confidence of detected language. Range [0, 1].

Dimension for the page.

Used in: Page

float width = 1
Page width.
float height = 2
Page height.
string unit = 3
Dimension unit.

A form field detected on the page.

Used in: Page

optional Layout field_name = 1
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta1.Document.Page.FormField] name. e.g. `Address`, `Email`, `Grand total`, `Phone number`, etc.
optional Layout field_value = 2
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta1.Document.Page.FormField] value.
repeated DetectedLanguage name_detected_languages = 3
A list of detected languages for name together with confidence.
repeated DetectedLanguage value_detected_languages = 4
A list of detected languages for value together with confidence.

Visual element describing a layout unit on a page.

Used in: Page, Block, FormField, Line, Paragraph, Table, Table.TableCell, Token, VisualElement

optional TextAnchor text_anchor = 1
Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta1.Document.text].
float confidence = 2
Confidence of the current [Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].
optional BoundingPoly bounding_poly = 3
The bounding polygon for the [Layout][google.cloud.documentai.v1beta1.Document.Page.Layout].
Layout.Orientation orientation = 4
Detected orientation for the [Layout][google.cloud.documentai.v1beta1.Document.Page.Layout].

Detected human reading orientation.

Used in: Layout

ORIENTATION_UNSPECIFIED = 0
Unspecified orientation.
PAGE_UP = 1
Orientation is aligned with page up.
PAGE_RIGHT = 2
Orientation is aligned with page right. Turn the head 90 degrees clockwise from upright to read.
PAGE_DOWN = 3
Orientation is aligned with page down. Turn the head 180 degrees from upright to read.
PAGE_LEFT = 4
Orientation is aligned with page left. Turn the head 90 degrees counterclockwise from upright to read.

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

Used in: Page

optional Layout layout = 1
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Line][google.cloud.documentai.v1beta1.Document.Page.Line].
repeated DetectedLanguage detected_languages = 2
A list of detected languages together with confidence.

A collection of lines that a human would perceive as a paragraph.

Used in: Page

optional Layout layout = 1
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Paragraph][google.cloud.documentai.v1beta1.Document.Page.Paragraph].
repeated DetectedLanguage detected_languages = 2
A list of detected languages together with confidence.

A table representation similar to HTML table structure.

Used in: Page

optional Layout layout = 1
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Table][google.cloud.documentai.v1beta1.Document.Page.Table].
repeated Table.TableRow header_rows = 2
Header rows of the table.
repeated Table.TableRow body_rows = 3
Body rows of the table.
repeated DetectedLanguage detected_languages = 4
A list of detected languages together with confidence.

A cell representation inside the table.

Used in: TableRow

optional Layout layout = 1
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [TableCell][google.cloud.documentai.v1beta1.Document.Page.Table.TableCell].
int32 row_span = 2
How many rows this cell spans.
int32 col_span = 3
How many columns this cell spans.
repeated DetectedLanguage detected_languages = 4
A list of detected languages together with confidence.

A row of table cells.

Used in: Table

repeated TableCell cells = 1
Cells that make up this row.

A detected token.

Used in: Page

optional Layout layout = 1
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Token][google.cloud.documentai.v1beta1.Document.Page.Token].
optional Token.DetectedBreak detected_break = 2
Detected break at the end of a [Token][google.cloud.documentai.v1beta1.Document.Page.Token].
repeated DetectedLanguage detected_languages = 3
A list of detected languages together with confidence.

Detected break at the end of a [Token][google.cloud.documentai.v1beta1.Document.Page.Token].

Used in: Token

DetectedBreak.Type type = 1
Detected break type.

Enum to denote the type of break found.

Used in: DetectedBreak

TYPE_UNSPECIFIED = 0
Unspecified break type.
SPACE = 1
A single whitespace.
WIDE_SPACE = 2
A wider whitespace.
HYPHEN = 3
A hyphen that indicates that a token has been split across lines.

Detected non-text visual elements e.g. checkbox, signature etc. on the page.

Used in: Page

optional Layout layout = 1
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Token][google.cloud.documentai.v1beta1.Document.Page.Token].
string type = 2
Type of the [VisualElement][google.cloud.documentai.v1beta1.Document.Page.VisualElement].
repeated DetectedLanguage detected_languages = 3
A list of detected languages together with confidence.

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

Used in: Document

int64 shard_index = 1
The 0-based index of this shard.
int64 shard_count = 2
Total number of shards.
int64 text_offset = 3
The index of the first character in [Document.text][google.cloud.documentai.v1beta1.Document.text] in the overall document global text.

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

Used in: Document

optional TextAnchor text_anchor = 1
Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta1.Document.text].
optional type.Color color = 2
Text color.
optional type.Color background_color = 3
Text background color.
string font_weight = 4
Font weight. Possible values are normal, bold, bolder, and lighter. https://www.w3schools.com/cssref/pr_font_weight.asp
string text_style = 5
Text style. Possible values are normal, italic, and oblique. https://www.w3schools.com/cssref/pr_font_font-style.asp
string text_decoration = 6
Text decoration. Follows CSS standard. <text-decoration-line> <text-decoration-color> <text-decoration-style> https://www.w3schools.com/cssref/pr_text_text-decoration.asp
optional Style.FontSize font_size = 7
Font size.

Font size with unit.

Used in: Style

float size = 1
Font size for the text.
string unit = 2
Unit for the font size. Follows CSS naming (in, px, pt, etc.).

Text reference indexing into the [Document.text][google.cloud.documentai.v1beta1.Document.text].

Used in: Entity, Page.Layout, Style

repeated TextAnchor.TextSegment text_segments = 1
The text segments from the [Document.text][google.cloud.documentai.v1beta1.Document.text].

A text segment in the [Document.text][google.cloud.documentai.v1beta1.Document.text]. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See [ShardInfo.text_offset][google.cloud.documentai.v1beta1.Document.ShardInfo.text_offset]

Used in: TextAnchor

int64 start_index = 1
[TextSegment][google.cloud.documentai.v1beta1.Document.TextAnchor.TextSegment] start UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta1.Document.text].
int64 end_index = 2
[TextSegment][google.cloud.documentai.v1beta1.Document.TextAnchor.TextSegment] half open end UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta1.Document.text].

Parameters to control entity extraction behavior.

Used in: ProcessDocumentRequest

bool enabled = 1
Whether to enable entity extraction.
string model_version = 2
Model version of the entity extraction. Default is "builtin/stable". Specify "builtin/latest" for the latest model.

Parameters to control form extraction behavior.

Used in: ProcessDocumentRequest

bool enabled = 1
Whether to enable form extraction.
repeated KeyValuePairHint key_value_pair_hints = 2
User can provide pairs of (key text, value type) to improve the parsing result. For example, if a document has a field called "Date" that holds a date value and a field called "Amount" that may hold either a currency value (e.g., "$500.00") or a simple number value (e.g., "20"), you could use the following hints: [ {"key": "Date", value_types: [ "DATE"]}, {"key": "Amount", "value_types": [ "PRICE", "NUMBER" ]} ] If the value type is unknown, but you want to provide hints for the keys, you can leave the value_types field blank. e.g. {"key": "Date", "value_types": []}
string model_version = 3
Model version of the form extraction system. Default is "builtin/stable". Specify "builtin/latest" for the latest model.

The Google Cloud Storage location where the output file will be written to.

Used in: OutputConfig

string uri = 1

The Google Cloud Storage location where the input file will be read from.

Used in: InputConfig

string uri = 1

The desired input location and metadata.

Used in: ProcessDocumentRequest, ProcessDocumentResponse

oneof source
Required.
- GcsSource gcs_source = 1
  The Google Cloud Storage location to read the input from. This must be a single file.
string mime_type = 2
Required. Mimetype of the input. Current supported mimetypes are application/pdf, image/tiff, and image/gif.

User-provided hint for key value pair.

Used in: FormExtractionParams

string key = 1
The key text for the hint.
repeated string value_types = 2
Type of the value. This is case-insensitive, and could be one of: ADDRESS, LOCATION, ORGANIZATION, PERSON, PHONE_NUMBER, ID, NUMBER, EMAIL, PRICE, TERMS, DATE, NAME. Types not in this list will be ignored.

A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.

Used in: BoundingPoly

float x = 1
X coordinate.
float y = 2
Y coordinate.

Parameters to control Optical Character Recognition (OCR) behavior.

Used in: ProcessDocumentRequest

repeated string language_hints = 1
List of languages to use for OCR. In most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting `language_hints` is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong). Document processing returns an error if one or more of the specified languages is not one of the supported languages.

Contains metadata for the BatchProcessDocuments operation.

OperationMetadata.State state = 1
The state of the current batch processing.
string state_message = 2
A message providing more details about the current state of processing.
optional protobuf.Timestamp create_time = 3
The creation time of the operation.
optional protobuf.Timestamp update_time = 4
The last update time of the operation.

Used in: OperationMetadata

STATE_UNSPECIFIED = 0
The default value. This value is used if the state is omitted.
ACCEPTED = 1
Request is received.
WAITING = 2
Request operation is waiting for scheduling.
RUNNING = 3
Request is being processed.
SUCCEEDED = 4
The batch processing completed successfully.
CANCELLED = 5
The batch processing was cancelled.
FAILED = 6
The batch processing has failed.

The desired output location and metadata.

Used in: ProcessDocumentRequest, ProcessDocumentResponse

oneof destination
Required.
- GcsDestination gcs_destination = 1
  The Google Cloud Storage location to write the output to.
int32 pages_per_shard = 2
The max number of pages to include into each output Document shard JSON on Google Cloud Storage. The valid range is [1, 100]. If not specified, the default value is 20. For example, for one pdf file with 100 pages, 100 parsed pages will be produced. If `pages_per_shard` = 20, then 5 Document shard JSON files each containing 20 parsed pages will be written under the prefix [OutputConfig.gcs_destination.uri][] and suffix pages-x-to-y.json where x and y are 1-indexed page numbers. Example GCS outputs with 157 pages and pages_per_shard = 50: <prefix>pages-001-to-050.json <prefix>pages-051-to-100.json <prefix>pages-101-to-150.json <prefix>pages-151-to-157.json

Request to process one document.

Used in: BatchProcessDocumentsRequest

optional InputConfig input_config = 1
Required. Information about the input file.
optional OutputConfig output_config = 2
Required. The desired output location.
string document_type = 3
Specifies a known document type for deeper structure detection. Valid values are currently "general" and "invoice". If not provided, "general"\ is used as default. If any other value is given, the request is rejected.
optional TableExtractionParams table_extraction_params = 4
Controls table extraction behavior. If not specified, the system will decide reasonable defaults.
optional FormExtractionParams form_extraction_params = 5
Controls form extraction behavior. If not specified, the system will decide reasonable defaults.
optional EntityExtractionParams entity_extraction_params = 6
Controls entity extraction behavior. If not specified, the system will decide reasonable defaults.
optional OcrParams ocr_params = 7
Controls OCR behavior. If not specified, the system will decide reasonable defaults.

Response to a single document processing request.

Used in: BatchProcessDocumentsResponse

optional InputConfig input_config = 1
Information about the input file. This is the same as the corresponding input config in the request.
optional OutputConfig output_config = 2
The output location of the parsed responses. The responses are written to this location as JSON-serialized `Document` objects.

A hint for a table bounding box on the page for table parsing.

Used in: TableExtractionParams

int32 page_number = 1
Optional. Page number for multi-paged inputs this hint applies to. If not provided, this hint will apply to all pages by default. This value is 1-based.
optional BoundingPoly bounding_box = 2
Bounding box hint for a table on this page. The coordinates must be normalized to [0,1] and the bounding box must be an axis-aligned rectangle.

Parameters to control table extraction behavior.

Used in: ProcessDocumentRequest

bool enabled = 1
Whether to enable table extraction.
repeated TableBoundHint table_bound_hints = 2
Optional. Table bounding box hints that can be provided to complex cases which our algorithm cannot locate the table(s) in.
repeated string header_hints = 3
Optional. Table header hints. The extraction will bias towards producing these terms as table headers, which may improve accuracy.
string model_version = 4
Model version of the table extraction system. Default is "builtin/stable". Specify "builtin/latest" for the latest model.

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

Used in: BoundingPoly

int32 x = 1
X coordinate.
int32 y = 2
Y coordinate.

package google.cloud.documentai.v1beta1

service DocumentUnderstandingService

rpc BatchProcessDocuments (BatchProcessDocumentsRequest, longrunning.Operation)

message BatchProcessDocumentsRequest

repeated ProcessDocumentRequest requests = 1

string parent = 2

message BatchProcessDocumentsResponse

repeated ProcessDocumentResponse responses = 1

message BoundingPoly

repeated Vertex vertices = 1

repeated NormalizedVertex normalized_vertices = 2

message Document

oneof source

string uri = 1

bytes content = 2

string mime_type = 3

string text = 4

repeated Document.Style text_styles = 5

repeated Document.Page pages = 6

repeated Document.Entity entities = 7

repeated Document.EntityRelation entity_relations = 8

optional Document.ShardInfo shard_info = 9

optional rpc.Status error = 10

message Document.Entity

optional TextAnchor text_anchor = 1

string type = 2

string mention_text = 3

string mention_id = 4

message Document.EntityRelation

string subject_id = 1

string object_id = 2

string relation = 3

message Document.Page

int32 page_number = 1

optional Page.Dimension dimension = 2

optional Page.Layout layout = 3

repeated Page.DetectedLanguage detected_languages = 4

repeated Page.Block blocks = 5

repeated Page.Paragraph paragraphs = 6

repeated Page.Line lines = 7

repeated Page.Token tokens = 8

repeated Page.VisualElement visual_elements = 9

repeated Page.Table tables = 10

repeated Page.FormField form_fields = 11

message Document.Page.Block

optional Layout layout = 1

repeated DetectedLanguage detected_languages = 2

message Document.Page.DetectedLanguage

string language_code = 1

float confidence = 2

message Document.Page.Dimension

float width = 1

float height = 2

string unit = 3

message Document.Page.FormField

optional Layout field_name = 1

optional Layout field_value = 2

repeated DetectedLanguage name_detected_languages = 3

repeated DetectedLanguage value_detected_languages = 4

message Document.Page.Layout

optional TextAnchor text_anchor = 1

float confidence = 2

optional BoundingPoly bounding_poly = 3

Layout.Orientation orientation = 4

enum Document.Page.Layout.Orientation

ORIENTATION_UNSPECIFIED = 0

PAGE_UP = 1

PAGE_RIGHT = 2

PAGE_DOWN = 3

PAGE_LEFT = 4

message Document.Page.Line

optional Layout layout = 1

repeated DetectedLanguage detected_languages = 2

message Document.Page.Paragraph

optional Layout layout = 1

repeated DetectedLanguage detected_languages = 2

message Document.Page.Table

optional Layout layout = 1

repeated Table.TableRow header_rows = 2

repeated Table.TableRow body_rows = 3