Cloud Technology

In today’s digital landscape, extracting transcripts from audio and video files is a crucial task for many industries, from journalism to academia and beyond. AWS Transcribe simplifies this process by providing accurate and efficient transcription services to convert audio into text. In this guide, we’ll walk you through the steps to extract transcripts using lambda functions integrated with DynamoDB. 

Transcription methods in AWS are separated into two categories:

  • Batch transcriptions: process audio data asynchronously in batches or groups. Starts transcription once the files are uploaded and submitted to a service or platform. Suitable for pre-recorded audio.
  • Streaming transcriptions: process audio data in real time, continuously sending data to the transcription service. Suitable for real time applications like broadcast and virtual meetings.

When performing batch transcription, the media data and information is stored in an Amazon S3 bucket, if the bucket is not specified by you, AWS uses a secure service-managed bucket with a temporary URI by default.


When using the default S3 bucket for Amazon Transcribe, after a transcription job is completed, the information is stored in the bucket with the temporary URI that contains the transcript text information. This URI only lasts 15 minutes after the job is completed and the bucket items are automatically deleted after 90 days.

You can extract the job information with a Lambda function, using the client with the method GetTranscriptionJobthat that provides information such as TranscriptionJobStatus, MediaFormat, etc.

response = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)

Here is an example of the response:

{
"TranscriptionJob": {
"TranscriptionJobName": "job_name1234567890",
"TranscriptionJobStatus": "COMPLETED",
"LanguageCode": "es-ES",
"MediaSampleRateHertz": 8000,
"MediaFormat": "wav",
"Media": {
"MediaFileUri": "https://s3-us-east-1.amazonaws.com/default_transcriptions/recordings/MyfirstRecording0.wav"
},
"Transcript": {
"TranscriptFileUri": "https://s3.us-east-1.amazonaws.com/aws-transcribe-us-east-1-prod/1234567890/job_name1234567890
/asrOutput.json?X-Amz-Security-Token=EXAMPLETOKEN&X-Amz-Algorithm=AWS4-EXAMPLE-Amz-Date=20240422T143557Z&
X-Amz-SignedHeaders=host&X-Amz-Expires=899&X-Amz-Credential=AWS-EXAMPLE_request&X-Amz-Signature=
dkjshfkjfhksjdhfksjdhfsdjfhskjfdh"
},
"StartTime": "2024-04-19T03:23:56.842Z",
"CreationTime": "2024-04-19T03:23:56.826Z",
"CompletionTime": "2024-04-19T03:24:22.402Z",
"Settings": {
"ChannelIdentification": false,
"ShowAlternatives": false
}
}
}

Depending on the status of the transcription job you may receive different results. When status is “COMPLETED”, it means the job is finished and we can get the URI for the transcription file. If the status is “FAILED”, the response will include details on why the transcription job failed.

The transcription file URI is a URL that points to the location where the transcription file is stored after a transcription job is completed.

transcript_file_uri = response['TranscriptionJob']['Transcript']['TranscriptFileUri']

This file contains the result of the transcription process in JSON format. Depending on the settings, the content may vary.

If access is denied when entering the URL of the transcription file, it may be due to a time limit. To get a new temporary URI make a GetTranscriptionJob request again.

To fetch the JSON file from where it’s stored, you need to make an HTTP GET request to the transcription file URI:

data = http.request('GET', transcript_file_uri)

Once the request is made, we obtain the response data. This data is normally in bytes format, so you can first decode it into a UTF-8 string and then load it as JSON using the “json.loads()”method. After this, the data becomes a Python dictionary containing the JSON content of the transcription file.

Example of decoding and loading json file:

http = urllib3.PoolManager()
data = http.request('GET', transcript_file_uri)
data = json.loads(data.data.decode('utf-8'))

Finally, we extract the transcription text from the loaded JSON data. Here is an example of the transcript structure:

{ "jobName": "my-first-transcription-job", "accountId": "111122223333", "results": { "transcripts": [ { "transcript":
"Welcome to Amazon Transcribe." } ], "items": [ { "start_time": "0.64", Output 120 Amazon Transcribe Developer Guide
"end_time": "1.09", "alternatives": [ { "confidence": "1.0", "content": "Welcome" } ], "type": "pronunciation" },
{ "start_time": "1.09", "end_time": "1.21", "alternatives": [ { "confidence": "1.0", "content": "to" } ], "type":
"pronunciation" }, { "start_time": "1.21", "end_time": "1.74", "alternatives": [ { "confidence":
"1.0", "content":"Amazon" } ], "type": "pronunciation" }, { "start_time": "1.74", "end_time": "2.56", "alternatives":
[ { "confidence": "1.0", "content":"Transcribe" } ], "type": "pronunciation" }, { "alternatives": [ Output 121 Amazon
Transcribe Developer Guide { "confidence": "0.0", "content": "." } ], "type": "punctuation" } ] }, "status": "COMPLETED"}

Based on this structure, we can extract the text in the following way:

text = data['results']['transcripts'][0]['transcript']

Text output (string):

“Welcome to Amazon Transcribe.”

Contact us today if you need help for a project involving Transcribe

Author

msuarez