My use case: transcribing 1-hour user interview audio into text.
Result: still looking for a solution
Experience:
I used the Ruby .gem google-api-client
to access the system.
I set GOOGLE_ACCOUNT_TYPE
, GOOGLE_CLIENT_ID
, GOOGLE_CLIENT_EMAIL
, GOOGLE_PRIVATE_KEY
env variables, based on the Service Account key I got from the Google Developer Console.
I used this code to test an API request:
require 'google/apis/speech_v1beta1'
audio_file_path = 'brooklyn.wav'
speech_service = Google::Apis::SpeechV1beta1::SpeechService.new
speech_service.authorization = Google::Auth.get_application_default(
%[ https://www.googleapis.com/auth/cloud-platform ]
)
request = Google::Apis::SpeechV1beta1::AsyncRecognizeRequest.new
request.audio = {
content: File.read(audio_file_path)
}
request.config = {
encoding: "LINEAR16", # or "FLAC"
sample_rate: 16000 # or 44000
};
# Make the Async request
response = speech_service.async_recognize_speech request
puts response.name
# Then, get the result of the Async job
status = speech_service.get_operation response.name
The result of status
should be the transcription response from the Google Speech API which contains the transcribed text of the audio snippet uploaded.
Google Speech API is currently in beta, I expect it to have a focal use case, and yes, the sample sound of the Brooklyn bridge works well - a short, clear, concise snippet of audio. However, an open-ended ~40ish minute conversation submitted to the Speech API returned an array of possible one-word transcriptions - each sorta funny, but ultimately abysmally inaccurate.
Verdict
A speech API. Amazing!
Not a transcription API, oh well.