Demo iOS application to generate an accurately timestamped transcript given an audio file and its pre-supplied text.
It leverages Google Cloud Platform's Speech-to-Text API for its time offsets feature (that is, returning the absolute timestamp of each word relative to the beginning of the audio). Any mistranscriptions from the service can then be corrected using the pre-supplied text and a sequence alignment algorithm.
Why is this useful?
In cases where the raw text transcription of a piece of audio is known upfront, we can use this to obtain the correct word-level timestamps. This resulting information is useful for contexts relating to interactive language education and music lyrics.
String)Secrets.template.plist to Secrets.plist and provide your Google Cloud Platform API key for the field GOOGLE_API_KEY.SpeechTimestamper target in Xcode.This project uses gRPC instead of REST to communicate with GCP services. While there is not yet an official Google Cloud Client Library for Swift), this project can be used as an example of how to interact with Google Cloud services (polling long-running operations, etc) using Swift with gRPC. For basic functionality of the gRPC API for Swift, see the Swift gRPC Overview.
If you'd like to generate the most up-to-date protobuf definitions, you'll need protobuf installed and then run a fresh pod update.