package google.cloud.speech.v1

Get desktop application:
View/edit binary Protocol Buffers messages

Service that implements Google Cloud Speech API.

rpc Recognize (stream RecognizeRequest, stream RecognizeResponse)
cloud-speech.proto:30
Perform bidirectional streaming speech recognition on audio using gRPC.
rpc NonStreamingRecognize (RecognizeRequest, NonStreamingRecognizeResponse)
cloud-speech.proto:33
Perform non-streaming speech recognition on audio using HTTPS.
message NonStreamingRecognizeResponse
cloud-speech.proto:188
`NonStreamingRecognizeResponse` is the only message returned to the client by `NonStreamingRecognize`. It contains the result as zero or more sequential `RecognizeResponse` messages. Note that streaming `Recognize` will also return multiple `RecognizeResponse` messages, but each message is individually streamed.
- repeated RecognizeResponse responses = 1
  [Output-only] Sequential list of messages returned by the recognizer.

Contains audio data in the format specified in the `InitialRecognizeRequest`. Either `content` or `uri` must be supplied. Supplying both or neither returns [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT].

Used in: RecognizeRequest

bytes content = 1
The audio data bytes encoded as specified in `InitialRecognizeRequest`. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.
string uri = 2
URI that points to a file that contains audio data bytes as specified in `InitialRecognizeRequest`. Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: `gs://bucket_name/object_name` (other URI formats return [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]). For more information, see [Request URIs](/storage/docs/reference-uris).

The `InitialRecognizeRequest` message provides information to the recognizer that specifies how to process the request.

Used in: RecognizeRequest

InitialRecognizeRequest.AudioEncoding encoding = 1
[Required] Encoding of audio data sent in all `AudioRequest` messages.
int32 sample_rate = 2
[Required] Sample rate in Hertz of the audio data sent in all AudioRequest messages. 16000 is optimal. Valid values are: 8000-48000.
string language_code = 3
[Optional] The language of the supplied audio as a BCP-47 language tag. Example: "en-GB" https://www.rfc-editor.org/rfc/bcp/bcp47.txt If omitted, defaults to "en-US".
int32 max_alternatives = 4
[Optional] Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of `SpeechRecognitionAlternative` messages within each `SpeechRecognitionResult`. The server may return fewer than `max_alternatives`. Valid values are `0`-`30`. A value of `0` or `1` will return a maximum of `1`. If omitted, defaults to `1`.
bool profanity_filter = 5
[Optional] If set to `true`, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". If set to `false` or omitted, profanities won't be filtered out.
bool continuous = 6
[Optional] If `false` or omitted, the recognizer will detect a single spoken utterance, and it will cease recognition when the user stops speaking. If `enable_endpointer_events` is `true`, it will return `END_OF_UTTERANCE` when it detects that the user has stopped speaking. In all cases, it will return no more than one `SpeechRecognitionResult`, and set the `is_final` flag to `true`. If `true`, the recognizer will continue recognition (even if the user pauses speaking) until the client sends an `end_of_data` message or when the maximum time limit has been reached. Multiple `SpeechRecognitionResult`s with the `is_final` flag set to `true` may be returned to indicate that the recognizer will not return any further hypotheses for this portion of the transcript.
bool interim_results = 7
[Optional] If this parameter is `true`, interim results may be returned as they become available. If `false` or omitted, only `is_final=true` result(s) are returned.
bool enable_endpointer_events = 8
[Optional] If this parameter is `true`, `EndpointerEvents` may be returned as they become available. If `false` or omitted, no `EndpointerEvents` are returned.
string output_uri = 9
[Optional] URI that points to a file where the recognition result should be stored in JSON format. If omitted or empty string, the recognition result is returned in the response. Should be specified only for `NonStreamingRecognize`. If specified in a `Recognize` request, `Recognize` returns [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. If specified in a `NonStreamingRecognize` request, `NonStreamingRecognize` returns immediately, and the output file is created asynchronously once the audio processing completes. Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: `gs://bucket_name/object_name` (other URI formats return [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]). For more information, see [Request URIs](/storage/docs/reference-uris).

Audio encoding of the data sent in the audio message.

Used in: InitialRecognizeRequest

ENCODING_UNSPECIFIED = 0
Not specified. Will return result [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT].
LINEAR16 = 1
Uncompressed 16-bit signed little-endian samples. This is the simplest encoding format, useful for getting started. However, because it is uncompressed, it is not recommended for deployed clients.
FLAC = 2
This is the recommended encoding format because it uses lossless compression; therefore recognition accuracy is not compromised by a lossy codec. The stream FLAC format is specified at: http://flac.sourceforge.net/documentation.html. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported.
MULAW = 3
8-bit samples that compand 14-bit audio samples using PCMU/mu-law.
AMR = 4
Adaptive Multi-Rate Narrowband codec. `sample_rate` must be 8000 Hz.
AMR_WB = 5
Adaptive Multi-Rate Wideband codec. `sample_rate` must be 16000 Hz.

`RecognizeRequest` is the only message type sent by the client. `NonStreamingRecognize` sends only one `RecognizeRequest` message and it must contain both an `initial_request` and an 'audio_request`. Streaming `Recognize` sends one or more `RecognizeRequest` messages. The first message must contain an `initial_request` and may contain an 'audio_request`. Any subsequent messages must not contain an `initial_request` and must contain an 'audio_request`.

Used as request type in: Speech.NonStreamingRecognize, Speech.Recognize

optional InitialRecognizeRequest initial_request = 1
The `initial_request` message provides information to the recognizer that specifies how to process the request. The first `RecognizeRequest` message must contain an `initial_request`. Any subsequent `RecognizeRequest` messages must not contain an `initial_request`.
optional AudioRequest audio_request = 2
The audio data to be recognized. For `NonStreamingRecognize`, all the audio data must be contained in the first (and only) `RecognizeRequest` message. For streaming `Recognize`, sequential chunks of audio data are sent in sequential `RecognizeRequest` messages.

`RecognizeResponse` is the only message type returned to the client.

Used as response type in: Speech.Recognize

Used as field type in: NonStreamingRecognizeResponse

optional rpc.Status error = 1
[Output-only] If set, returns a [google.rpc.Status][] message that specifies the error for the operation.
repeated SpeechRecognitionResult results = 2
[Output-only] For `continuous=false`, this repeated list contains zero or one result that corresponds to all of the audio processed so far. For `continuous=true`, this repeated list contains zero or more results that correspond to consecutive portions of the audio being processed. In both cases, contains zero or one `is_final=true` result (the newly settled portion), followed by zero or more `is_final=false` results.
int32 result_index = 3
[Output-only] Indicates the lowest index in the `results` array that has changed. The repeated `SpeechRecognitionResult` results overwrite past results at this index and higher.
RecognizeResponse.EndpointerEvent endpoint = 4
[Output-only] Indicates the type of endpointer event.

Indicates the type of endpointer event.

Used in: RecognizeResponse

ENDPOINTER_EVENT_UNSPECIFIED = 0
No endpointer event specified.
START_OF_SPEECH = 1
Speech has been detected in the audio stream.
END_OF_SPEECH = 2
Speech has ceased to be detected in the audio stream.
END_OF_AUDIO = 3
The end of the audio stream has been reached. and it is being processed.
END_OF_UTTERANCE = 4
This event is only sent when continuous is `false`. It indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio. The client should stop sending additional audio data.

Alternative hypotheses (a.k.a. n-best list).

Used in: SpeechRecognitionResult

string transcript = 1
[Output-only] Transcript text representing the words that the user spoke.
float confidence = 2
[Output-only] The confidence estimate between 0.0 and 1.0. A higher number means the system is more confident that the recognition is correct. This field is typically provided only for the top hypothesis. and only for `is_final=true` results. The default of 0.0 is a sentinel value indicating confidence was not set.

A speech recognition result corresponding to a portion of the audio.

Used in: RecognizeResponse

repeated SpeechRecognitionAlternative alternatives = 1
[Output-only] May contain one or more recognition hypotheses (up to the maximum specified in `max_alternatives`).
bool is_final = 2
[Output-only] Set `true` if this is the final time the speech service will return this particular `SpeechRecognitionResult`. If `false`, this represents an interim result that may change.
float stability = 3
[Output-only] An estimate of the probability that the recognizer will not change its guess about this interim result. Values range from 0.0 (completely unstable) to 1.0 (completely stable). Note that this is not the same as `confidence`, which estimates the probability that a recognition result is correct. This field is only provided for interim results (`is_final=false`). The default of 0.0 is a sentinel value indicating stability was not set.

package google.cloud.speech.v1

service Speech

rpc Recognize (stream RecognizeRequest, stream RecognizeResponse)

rpc NonStreamingRecognize (RecognizeRequest, NonStreamingRecognizeResponse)

message NonStreamingRecognizeResponse

repeated RecognizeResponse responses = 1

message AudioRequest

bytes content = 1

string uri = 2

message InitialRecognizeRequest

InitialRecognizeRequest.AudioEncoding encoding = 1

int32 sample_rate = 2

string language_code = 3

int32 max_alternatives = 4

bool profanity_filter = 5

bool continuous = 6

bool interim_results = 7

bool enable_endpointer_events = 8

string output_uri = 9

enum InitialRecognizeRequest.AudioEncoding

ENCODING_UNSPECIFIED = 0

LINEAR16 = 1

FLAC = 2

MULAW = 3

AMR = 4

AMR_WB = 5

message RecognizeRequest

optional InitialRecognizeRequest initial_request = 1

optional AudioRequest audio_request = 2

message RecognizeResponse

optional rpc.Status error = 1

repeated SpeechRecognitionResult results = 2

int32 result_index = 3

RecognizeResponse.EndpointerEvent endpoint = 4

enum RecognizeResponse.EndpointerEvent

ENDPOINTER_EVENT_UNSPECIFIED = 0

START_OF_SPEECH = 1

END_OF_SPEECH = 2

END_OF_AUDIO = 3

END_OF_UTTERANCE = 4

message SpeechRecognitionAlternative

string transcript = 1

float confidence = 2

message SpeechRecognitionResult

repeated SpeechRecognitionAlternative alternatives = 1

bool is_final = 2

float stability = 3