Get desktop application:
View/edit binary Protocol Buffers messages
Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.
LRO endpoint to batch process many documents.
Request to batch process documents as an asynchronous operation.
Required. Individual requests for each document.
Target project and location to make a call. Format: `projects/{project-id}/locations/{location-id}`. If no location is specified, a region will be chosen automatically.
Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.
Responses for each individual document.
A bounding polygon for the detected image annotation.
Used in:
,The bounding polygon vertices.
The bounding polygon normalized vertices.
Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.
Original source document from the user.
Currently supports Google Cloud Storage URI of the form `gs://bucket_name/object_name`. Object versioning is not supported. See [Google Cloud Storage Request URIs](https://cloud.google.com/storage/docs/reference-uris) for more info.
Inline document content, represented as a stream of bytes. Note: As with all `bytes` fields, protobuffers use a pure binary representation, whereas JSON representations use base64.
An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media-types/media-types.xhtml.
UTF-8 encoded text in reading order from the document.
Styles for the [Document.text][google.cloud.documentai.v1beta1.Document.text].
Visual page layout for the [Document][google.cloud.documentai.v1beta1.Document].
A list of entities detected on [Document.text][google.cloud.documentai.v1beta1.Document.text]. For document shards, entities in this list may cross shard boundaries.
Relationship among [Document.entities][google.cloud.documentai.v1beta1.Document.entities].
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.
Any error that occurred while processing this document.
A phrase in the text that is a known entity type, such as a person, an organization, or location.
Used in:
Provenance of the entity. Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta1.Document.text].
Entity type from a schema e.g. `Address`.
Text value in the document e.g. `1600 Amphitheatre Pkwy`.
Canonical mention name. This will be a unique value in the entity list for this document.
Relationship between [Entities][google.cloud.documentai.v1beta1.Document.Entity].
Used in:
Subject entity mention_id.
Object entity mention_id.
Relationship description.
A page in a [Document][google.cloud.documentai.v1beta1.Document].
Used in:
1-based index for current [Page][google.cloud.documentai.v1beta1.Document.Page] in a parent [Document][google.cloud.documentai.v1beta1.Document]. Useful when a page is taken out of a [Document][google.cloud.documentai.v1beta1.Document] for individual processing.
Physical dimension of the page.
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for the page.
A list of detected languages together with confidence.
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.
A list of visually detected tokens on the page.
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.
A list of visually detected tables on the page.
A list of visually detected form fields on the page.
A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
Used in:
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Block][google.cloud.documentai.v1beta1.Document.Page.Block].
A list of detected languages together with confidence.
Detected language for a structural component.
Used in:
, , , , , , , ,The BCP-47 language code, such as "en-US" or "sr-Latn". For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.
Confidence of detected language. Range [0, 1].
Dimension for the page.
Used in:
Page width.
Page height.
Dimension unit.
A form field detected on the page.
Used in:
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta1.Document.Page.FormField] name. e.g. `Address`, `Email`, `Grand total`, `Phone number`, etc.
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta1.Document.Page.FormField] value.
A list of detected languages for name together with confidence.
A list of detected languages for value together with confidence.
Visual element describing a layout unit on a page.
Used in:
, , , , , , , ,Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta1.Document.text].
Confidence of the current [Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].
The bounding polygon for the [Layout][google.cloud.documentai.v1beta1.Document.Page.Layout].
Detected orientation for the [Layout][google.cloud.documentai.v1beta1.Document.Page.Layout].
Detected human reading orientation.
Used in:
Unspecified orientation.
Orientation is aligned with page up.
Orientation is aligned with page right. Turn the head 90 degrees clockwise from upright to read.
Orientation is aligned with page down. Turn the head 180 degrees from upright to read.
Orientation is aligned with page left. Turn the head 90 degrees counterclockwise from upright to read.
A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.
Used in:
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Line][google.cloud.documentai.v1beta1.Document.Page.Line].
A list of detected languages together with confidence.
A collection of lines that a human would perceive as a paragraph.
Used in:
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Paragraph][google.cloud.documentai.v1beta1.Document.Page.Paragraph].
A list of detected languages together with confidence.
A table representation similar to HTML table structure.
Used in:
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Table][google.cloud.documentai.v1beta1.Document.Page.Table].
Header rows of the table.
Body rows of the table.
A list of detected languages together with confidence.
A cell representation inside the table.
Used in:
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [TableCell][google.cloud.documentai.v1beta1.Document.Page.Table.TableCell].
How many rows this cell spans.
How many columns this cell spans.
A list of detected languages together with confidence.
A row of table cells.
Used in:
Cells that make up this row.
A detected token.
Used in:
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Token][google.cloud.documentai.v1beta1.Document.Page.Token].
Detected break at the end of a [Token][google.cloud.documentai.v1beta1.Document.Page.Token].
A list of detected languages together with confidence.
Detected break at the end of a [Token][google.cloud.documentai.v1beta1.Document.Page.Token].
Used in:
Detected break type.
Enum to denote the type of break found.
Used in:
Unspecified break type.
A single whitespace.
A wider whitespace.
A hyphen that indicates that a token has been split across lines.
Detected non-text visual elements e.g. checkbox, signature etc. on the page.
Used in:
[Layout][google.cloud.documentai.v1beta1.Document.Page.Layout] for [Token][google.cloud.documentai.v1beta1.Document.Page.Token].
Type of the [VisualElement][google.cloud.documentai.v1beta1.Document.Page.VisualElement].
A list of detected languages together with confidence.
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
Used in:
The 0-based index of this shard.
Total number of shards.
The index of the first character in [Document.text][google.cloud.documentai.v1beta1.Document.text] in the overall document global text.
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
Used in:
Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta1.Document.text].
Text color.
Text background color.
Font weight. Possible values are normal, bold, bolder, and lighter. https://www.w3schools.com/cssref/pr_font_weight.asp
Text style. Possible values are normal, italic, and oblique. https://www.w3schools.com/cssref/pr_font_font-style.asp
Text decoration. Follows CSS standard. <text-decoration-line> <text-decoration-color> <text-decoration-style> https://www.w3schools.com/cssref/pr_text_text-decoration.asp
Font size.
Font size with unit.
Used in:
Font size for the text.
Unit for the font size. Follows CSS naming (in, px, pt, etc.).
Text reference indexing into the [Document.text][google.cloud.documentai.v1beta1.Document.text].
Used in:
, ,The text segments from the [Document.text][google.cloud.documentai.v1beta1.Document.text].
A text segment in the [Document.text][google.cloud.documentai.v1beta1.Document.text]. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See [ShardInfo.text_offset][google.cloud.documentai.v1beta1.Document.ShardInfo.text_offset]
Used in:
[TextSegment][google.cloud.documentai.v1beta1.Document.TextAnchor.TextSegment] start UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta1.Document.text].
[TextSegment][google.cloud.documentai.v1beta1.Document.TextAnchor.TextSegment] half open end UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta1.Document.text].
Parameters to control entity extraction behavior.
Used in:
Whether to enable entity extraction.
Model version of the entity extraction. Default is "builtin/stable". Specify "builtin/latest" for the latest model.
Parameters to control form extraction behavior.
Used in:
Whether to enable form extraction.
User can provide pairs of (key text, value type) to improve the parsing result. For example, if a document has a field called "Date" that holds a date value and a field called "Amount" that may hold either a currency value (e.g., "$500.00") or a simple number value (e.g., "20"), you could use the following hints: [ {"key": "Date", value_types: [ "DATE"]}, {"key": "Amount", "value_types": [ "PRICE", "NUMBER" ]} ] If the value type is unknown, but you want to provide hints for the keys, you can leave the value_types field blank. e.g. {"key": "Date", "value_types": []}
Model version of the form extraction system. Default is "builtin/stable". Specify "builtin/latest" for the latest model.
The Google Cloud Storage location where the output file will be written to.
Used in:
The Google Cloud Storage location where the input file will be read from.
Used in:
The desired input location and metadata.
Used in:
,Required.
The Google Cloud Storage location to read the input from. This must be a single file.
Required. Mimetype of the input. Current supported mimetypes are application/pdf, image/tiff, and image/gif.
User-provided hint for key value pair.
Used in:
The key text for the hint.
Type of the value. This is case-insensitive, and could be one of: ADDRESS, LOCATION, ORGANIZATION, PERSON, PHONE_NUMBER, ID, NUMBER, EMAIL, PRICE, TERMS, DATE, NAME. Types not in this list will be ignored.
A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.
Used in:
X coordinate.
Y coordinate.
Parameters to control Optical Character Recognition (OCR) behavior.
Used in:
List of languages to use for OCR. In most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting `language_hints` is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong). Document processing returns an error if one or more of the specified languages is not one of the supported languages.
Contains metadata for the BatchProcessDocuments operation.
The state of the current batch processing.
A message providing more details about the current state of processing.
The creation time of the operation.
The last update time of the operation.
Used in:
The default value. This value is used if the state is omitted.
Request is received.
Request operation is waiting for scheduling.
Request is being processed.
The batch processing completed successfully.
The batch processing was cancelled.
The batch processing has failed.
The desired output location and metadata.
Used in:
,Required.
The Google Cloud Storage location to write the output to.
The max number of pages to include into each output Document shard JSON on Google Cloud Storage. The valid range is [1, 100]. If not specified, the default value is 20. For example, for one pdf file with 100 pages, 100 parsed pages will be produced. If `pages_per_shard` = 20, then 5 Document shard JSON files each containing 20 parsed pages will be written under the prefix [OutputConfig.gcs_destination.uri][] and suffix pages-x-to-y.json where x and y are 1-indexed page numbers. Example GCS outputs with 157 pages and pages_per_shard = 50: <prefix>pages-001-to-050.json <prefix>pages-051-to-100.json <prefix>pages-101-to-150.json <prefix>pages-151-to-157.json
Request to process one document.
Used in:
Required. Information about the input file.
Required. The desired output location.
Specifies a known document type for deeper structure detection. Valid values are currently "general" and "invoice". If not provided, "general"\ is used as default. If any other value is given, the request is rejected.
Controls table extraction behavior. If not specified, the system will decide reasonable defaults.
Controls form extraction behavior. If not specified, the system will decide reasonable defaults.
Controls entity extraction behavior. If not specified, the system will decide reasonable defaults.
Controls OCR behavior. If not specified, the system will decide reasonable defaults.
Response to a single document processing request.
Used in:
Information about the input file. This is the same as the corresponding input config in the request.
The output location of the parsed responses. The responses are written to this location as JSON-serialized `Document` objects.
A hint for a table bounding box on the page for table parsing.
Used in:
Optional. Page number for multi-paged inputs this hint applies to. If not provided, this hint will apply to all pages by default. This value is 1-based.
Bounding box hint for a table on this page. The coordinates must be normalized to [0,1] and the bounding box must be an axis-aligned rectangle.
Parameters to control table extraction behavior.
Used in:
Whether to enable table extraction.
Optional. Table bounding box hints that can be provided to complex cases which our algorithm cannot locate the table(s) in.
Optional. Table header hints. The extraction will bias towards producing these terms as table headers, which may improve accuracy.
Model version of the table extraction system. Default is "builtin/stable". Specify "builtin/latest" for the latest model.
A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.
Used in:
X coordinate.
Y coordinate.