Get desktop application:
View/edit binary Protocol Buffers messages
Service that implements Google Cloud Speech API.
Perform bidirectional streaming speech recognition on audio using gRPC.
Perform non-streaming speech recognition on audio using HTTPS.
`NonStreamingRecognizeResponse` is the only message returned to the client by `NonStreamingRecognize`. It contains the result as zero or more sequential `RecognizeResponse` messages. Note that streaming `Recognize` will also return multiple `RecognizeResponse` messages, but each message is individually streamed.
[Output-only] Sequential list of messages returned by the recognizer.
Contains audio data in the format specified in the `InitialRecognizeRequest`. Either `content` or `uri` must be supplied. Supplying both or neither returns [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT].
Used in:
The audio data bytes encoded as specified in `InitialRecognizeRequest`. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.
URI that points to a file that contains audio data bytes as specified in `InitialRecognizeRequest`. Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: `gs://bucket_name/object_name` (other URI formats return [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]). For more information, see [Request URIs](/storage/docs/reference-uris).
The `InitialRecognizeRequest` message provides information to the recognizer that specifies how to process the request.
Used in:
[Required] Encoding of audio data sent in all `AudioRequest` messages.
[Required] Sample rate in Hertz of the audio data sent in all AudioRequest messages. 16000 is optimal. Valid values are: 8000-48000.
[Optional] The language of the supplied audio as a BCP-47 language tag. Example: "en-GB" https://www.rfc-editor.org/rfc/bcp/bcp47.txt If omitted, defaults to "en-US".
[Optional] Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of `SpeechRecognitionAlternative` messages within each `SpeechRecognitionResult`. The server may return fewer than `max_alternatives`. Valid values are `0`-`30`. A value of `0` or `1` will return a maximum of `1`. If omitted, defaults to `1`.
[Optional] If set to `true`, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". If set to `false` or omitted, profanities won't be filtered out.
[Optional] If `false` or omitted, the recognizer will detect a single spoken utterance, and it will cease recognition when the user stops speaking. If `enable_endpointer_events` is `true`, it will return `END_OF_UTTERANCE` when it detects that the user has stopped speaking. In all cases, it will return no more than one `SpeechRecognitionResult`, and set the `is_final` flag to `true`. If `true`, the recognizer will continue recognition (even if the user pauses speaking) until the client sends an `end_of_data` message or when the maximum time limit has been reached. Multiple `SpeechRecognitionResult`s with the `is_final` flag set to `true` may be returned to indicate that the recognizer will not return any further hypotheses for this portion of the transcript.
[Optional] If this parameter is `true`, interim results may be returned as they become available. If `false` or omitted, only `is_final=true` result(s) are returned.
[Optional] If this parameter is `true`, `EndpointerEvents` may be returned as they become available. If `false` or omitted, no `EndpointerEvents` are returned.
[Optional] URI that points to a file where the recognition result should be stored in JSON format. If omitted or empty string, the recognition result is returned in the response. Should be specified only for `NonStreamingRecognize`. If specified in a `Recognize` request, `Recognize` returns [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. If specified in a `NonStreamingRecognize` request, `NonStreamingRecognize` returns immediately, and the output file is created asynchronously once the audio processing completes. Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: `gs://bucket_name/object_name` (other URI formats return [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]). For more information, see [Request URIs](/storage/docs/reference-uris).
Audio encoding of the data sent in the audio message.
Used in:
Not specified. Will return result [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT].
Uncompressed 16-bit signed little-endian samples. This is the simplest encoding format, useful for getting started. However, because it is uncompressed, it is not recommended for deployed clients.
This is the recommended encoding format because it uses lossless compression; therefore recognition accuracy is not compromised by a lossy codec. The stream FLAC format is specified at: http://flac.sourceforge.net/documentation.html. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported.
8-bit samples that compand 14-bit audio samples using PCMU/mu-law.
Adaptive Multi-Rate Narrowband codec. `sample_rate` must be 8000 Hz.
Adaptive Multi-Rate Wideband codec. `sample_rate` must be 16000 Hz.
`RecognizeRequest` is the only message type sent by the client. `NonStreamingRecognize` sends only one `RecognizeRequest` message and it must contain both an `initial_request` and an 'audio_request`. Streaming `Recognize` sends one or more `RecognizeRequest` messages. The first message must contain an `initial_request` and may contain an 'audio_request`. Any subsequent messages must not contain an `initial_request` and must contain an 'audio_request`.
Used as request type in: Speech.NonStreamingRecognize, Speech.Recognize
The `initial_request` message provides information to the recognizer that specifies how to process the request. The first `RecognizeRequest` message must contain an `initial_request`. Any subsequent `RecognizeRequest` messages must not contain an `initial_request`.
The audio data to be recognized. For `NonStreamingRecognize`, all the audio data must be contained in the first (and only) `RecognizeRequest` message. For streaming `Recognize`, sequential chunks of audio data are sent in sequential `RecognizeRequest` messages.
`RecognizeResponse` is the only message type returned to the client.
Used as response type in: Speech.Recognize
Used as field type in:
[Output-only] If set, returns a [google.rpc.Status][] message that specifies the error for the operation.
[Output-only] For `continuous=false`, this repeated list contains zero or one result that corresponds to all of the audio processed so far. For `continuous=true`, this repeated list contains zero or more results that correspond to consecutive portions of the audio being processed. In both cases, contains zero or one `is_final=true` result (the newly settled portion), followed by zero or more `is_final=false` results.
[Output-only] Indicates the lowest index in the `results` array that has changed. The repeated `SpeechRecognitionResult` results overwrite past results at this index and higher.
[Output-only] Indicates the type of endpointer event.
Indicates the type of endpointer event.
Used in:
No endpointer event specified.
Speech has been detected in the audio stream.
Speech has ceased to be detected in the audio stream.
The end of the audio stream has been reached. and it is being processed.
This event is only sent when continuous is `false`. It indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio. The client should stop sending additional audio data.
Alternative hypotheses (a.k.a. n-best list).
Used in:
[Output-only] Transcript text representing the words that the user spoke.
[Output-only] The confidence estimate between 0.0 and 1.0. A higher number means the system is more confident that the recognition is correct. This field is typically provided only for the top hypothesis. and only for `is_final=true` results. The default of 0.0 is a sentinel value indicating confidence was not set.
A speech recognition result corresponding to a portion of the audio.
Used in:
[Output-only] May contain one or more recognition hypotheses (up to the maximum specified in `max_alternatives`).
[Output-only] Set `true` if this is the final time the speech service will return this particular `SpeechRecognitionResult`. If `false`, this represents an interim result that may change.
[Output-only] An estimate of the probability that the recognizer will not change its guess about this interim result. Values range from 0.0 (completely unstable) to 1.0 (completely stable). Note that this is not the same as `confidence`, which estimates the probability that a recognition result is correct. This field is only provided for interim results (`is_final=false`). The default of 0.0 is a sentinel value indicating stability was not set.