How to Get text from speech - Example with AWS Transcribe and Lambda

_ msuarez

In today’s fast-paced digital world, the ability to convert audio and video files into text is a game-changer for businesses and organizations across various sectors, from media and entertainment to education and legal services. Implementing this capability can significantly enhance productivity, improve accessibility, and provide valuable insights from previously untapped data sources. Amazon Transcribe offers a robust, accurate, and scalable solution for automated transcription, making it an essential tool for any company looking to leverage audio data effectively. In this tutorial, we’ll guide you through the process of using Amazon Transcribe with AWS Lambda and DynamoDB to seamlessly extract and manage text from audio files, regardless of your familiarity with AWS.

Transcription methods in AWS are separated into two categories:

Batch transcriptions: process audio data asynchronously in batches or groups. Starts transcription once the files are uploaded and submitted to a service or platform. Suitable for pre-recorded audio.
Streaming transcriptions: process audio data in real time, continuously sending data to the transcription service. Suitable for real time applications like broadcast and virtual meetings.

When performing batch transcription, the media data and information is stored in an Amazon S3 bucket, if the bucket is not specified by you, AWS uses a secure service-managed bucket with a temporary URI by default.

When using the default S3 bucket for Amazon Transcribe, after a transcription job is completed, the information is stored in the bucket with the temporary URI that contains the transcript text information. This URI only lasts 15 minutes after the job is completed and the bucket items are automatically deleted after 90 days.

You can extract the job information with a Lambda function, using the client with the method GetTranscriptionJobthat that provides information such as TranscriptionJobStatus, MediaFormat, etc.

response = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)

Here is an example of the response:

{
"TranscriptionJob": {
"TranscriptionJobName": "job_name1234567890",
"TranscriptionJobStatus": "COMPLETED",
"LanguageCode": "es-ES",
"MediaSampleRateHertz": 8000,
"MediaFormat": "wav",
"Media": {
"MediaFileUri": "https://s3-us-east-1.amazonaws.com/default_transcriptions/recordings/MyfirstRecording0.wav"
},
"Transcript": {
"TranscriptFileUri": "https://s3.us-east-1.amazonaws.com/aws-transcribe-us-east-1-prod/1234567890/job_name1234567890
/asrOutput.json?X-Amz-Security-Token=EXAMPLETOKEN&amp;X-Amz-Algorithm=AWS4-EXAMPLE-Amz-Date=20240422T143557Z&amp;
X-Amz-SignedHeaders=host&amp;X-Amz-Expires=899&amp;X-Amz-Credential=AWS-EXAMPLE_request&amp;X-Amz-Signature=
dkjshfkjfhksjdhfksjdhfsdjfhskjfdh"
},
"StartTime": "2024-04-19T03:23:56.842Z",
"CreationTime": "2024-04-19T03:23:56.826Z",
"CompletionTime": "2024-04-19T03:24:22.402Z",
"Settings": {
"ChannelIdentification": false,
"ShowAlternatives": false
}
}
}

Depending on the status of the transcription job you may receive different results. When status is “COMPLETED”, it means the job is finished and we can get the URI for the transcription file. If the status is “FAILED”, the response will include details on why the transcription job failed.

The transcription file URI is a URL that points to the location where the transcription file is stored after a transcription job is completed.

transcript_file_uri = response['TranscriptionJob']['Transcript']['TranscriptFileUri']

This file contains the result of the transcription process in JSON format. Depending on the settings, the content may vary.

If access is denied when entering the URL of the transcription file, it may be due to a time limit. To get a new temporary URI make a GetTranscriptionJob request again.

To fetch the JSON file from where it’s stored, you need to make an HTTP GET request to the transcription file URI:

data = http.request('GET', transcript_file_uri)

Once the request is made, we obtain the response data. This data is normally in bytes format, so you can first decode it into a UTF-8 string and then load it as JSON using the “json.loads()”method. After this, the data becomes a Python dictionary containing the JSON content of the transcription file.

Example of decoding and loading json file:

http = urllib3.PoolManager()
data = http.request('GET', transcript_file_uri)
data = json.loads(data.data.decode('utf-8'))

Finally, we extract the transcription text from the loaded JSON data. Here is an example of the transcript structure:

{ "jobName": "my-first-transcription-job", "accountId": "111122223333", "results": { "transcripts": [ { "transcript":
 "Welcome to Amazon Transcribe." } ], "items": [ { "start_time": "0.64", Output 120 Amazon Transcribe Developer Guide
 "end_time": "1.09", "alternatives": [ { "confidence": "1.0", "content": "Welcome" } ], "type": "pronunciation" },
 { "start_time": "1.09", "end_time": "1.21", "alternatives": [ { "confidence": "1.0", "content": "to" } ], "type":
 "pronunciation" }, { "start_time": "1.21", "end_time": "1.74", "alternatives": [ { "confidence":
 "1.0", "content":"Amazon" } ], "type": "pronunciation" }, { "start_time": "1.74", "end_time": "2.56", "alternatives":
 [ { "confidence": "1.0", "content":"Transcribe" } ], "type": "pronunciation" }, { "alternatives": [ Output 121 Amazon
 Transcribe Developer Guide { "confidence": "0.0", "content": "." } ], "type": "punctuation" } ] }, "status": "COMPLETED"}

Based on this structure, we can extract the text in the following way:

text = data['results']['transcripts'][0]['transcript']

Text output (string):

“Welcome to Amazon Transcribe.”

Amazon Transcribe is a powerful tool that offers more than just basic transcription; it opens the door to advanced features like real-time transcription, custom vocabulary, and language identification. This tutorial has only scratched the surface of what Amazon Transcribe can do. By incorporating this service into your workflow, you can unlock new levels of efficiency and data analysis for your business. To fully explore the potential of Amazon Transcribe and discover additional ways it can benefit your organization, we encourage you to delve deeper into its extensive capabilities. If you’re looking to speed up the implementation process and manage these solutions effectively, our team is here to help build and customize these solutions to fit your specific needs. Reach out to us for expert guidance and support.

Contact us today if you need help for a project involving Transcribe

216

Author

How to Get text from speech – Example with AWS Transcribe and Lambda

Contact us today if you need help for a project involving Transcribe

Americas and EU Offices

Email address

Our US and EU Phone numbers

How to Get text from speech – Example with AWS Transcribe and Lambda

Contact us today if you need help for a project involving Transcribe

msuarez

How to Benefit from Reducing Network Performance Issues

Americas and EU Offices

Email address

Our US and EU Phone numbers