Unleashing the power of Text to Voice using Amazon Polly and Python

I recently had to come up with a solution to quickly turn text to voice and make it available as an MP3 file. Of course, being an AWS person, I knew exactly what to do! Amazon Polly to the rescue. Amazon Polly, is a simple to use service and, in this blog post, I'll show you how to get started with using it in Python.

What is Amazon Polly?

Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to convert written text into lifelike speech. It comes with dozens of high-quality, natural-sounding voices in various languages. You can quickly test and find the one that meets your requirments and specific user case here

The Polly is extremely simple and allows developers to quickly create applications that can interact with a human-like voice. Polly supports a wide range of use cases, from enhancing accessibility in applications to creating engaging voiceovers for multimedia content.

Getting Started with Amazon Polly in Python:

Step 1: Set Up Your AWS Account

Before diving into Amazon Polly, ensure you have an AWS account. Set up an IAM (Identity and Access Management) user with the necessary permissions to interact with Polly. The best policy that allows all actions except deleting of Lexicons is shown below:

{
   "Version": "2012-10-17",
   "Statement": [{
      "Sid": "AllowAllActions-DenyDelete",
      "Effect": "Allow",
      "Action": [
         "polly:DescribeVoices",
         "polly:GetLexicon",
         "polly:PutLexicon",
         "polly:SynthesizeSpeech",
         "polly:ListLexicons"],
      "Resource": "*"
      }
      {
      "Sid": "DenyDeleteLexicon",
      "Effect": "Deny",
      "Action": [
         "polly:DeleteLexicon"],
      "Resource": "*"
      }
   ]
}

Step 2: Install Boto3

Boto3 is the AWS SDK for Python, and you'll need it to interact with Polly. Install it using the following command:

pip install boto3

Step 3: Create a Polly Client

In your Python script, create a Polly client using the Boto3 library:

import boto3

# Create a Polly client
polly_client = boto3.client('polly', region_name='your-region')

Step 4: Synthesize Speech

Now, you can use Polly to synthesize speech from text:

audiofile= 'voicefile.mp3'

response = polly_client.synthesize_speech(
    Text='Hello, you are now using Amazon Polly!',
    VoiceId='Joanna',
    OutputFormat='mp3'
)

# Save the audio file
file = open(audiofile, 'wb')
file.write(response['AudioStream'].read())
file.close()

Putting it all together, including the ability to output the audio file using your system's audio device can be found in this gist

Conclusion

Amazon Polly, coupled with the versatility of Python, provides an easy, low friction, and affordable way to integrate text-to-speech capabilities into your applications. Whether you're building a voice-driven application, adding accessibility features, or creating engaging multimedia content, Amazon Polly is the service for you :)