Text-to-Speech Service ¶
Text-to-Speech (TTS) Service provides an implementation on the Android platform to synthesize non-Alexa speech on demand from text. It implements the Android abstract TextToSpeechService
class, which is part of Android TTS framework APIs, and in the backend uses the Text-to-Speech functionality provided by the Auto SDK. Android applications can interact with these standard Android TTS APIs to convert text to speech.
Table of Contents¶
- Overview
- Architecture
- Configuration
- Sample Usage
- Initialization
- Get Capabilities
- Synthesize Audio
- Known Issues
Overview¶
Text-to-Speech Service implements TextToSpeechService, which is an abstract base class for TTS engine implementations. The following list describes the responsibilities of the service and the Android TTS framework:
-
The service is responsible for:
- retrieving the TTS capabilities
- preparing the TTS audio and vending out the stream to the Android TTS framework APIs
-
The Android TTS framework is responsible for:
- handling the states of TTS requests
- playing out the audio
- saving the audio to file
The Text-to-Speech Service communicates with AACS to get the available locales and fetch the synthesized audio stream. Therefore, AACS must be started, configured, and running in connected state to ensure the Text-to-Speech Service can generate the designated TTS audio correctly.
Important! The Text-to-Speech Service requires the Local Voice Control and Alexa Custom Assistant extensions to function.
Architecture¶
The following diagram shows the high-level architecture of TTS Service and AACS. The blue box in the diagram represents the Text-to-Speech Service and the white boxes in the diagram represent the other components developed by Amazon. These components are all packaged in AARs to be added along with AACS AAR into Alexa Auto App or your application. The other boxes represent components that do not belong to Amazon: The yellow boxes represent the components from the Android TTS framework, and the green box represents the OEM application using Android TTS APIs.
- Android Application is owned by OEMs to handle the Text-to-Speech interaction with the end user by creating an instance of the
TextToSpeech
object and issuing Text-to-Speech requests. - TextToSpeech is the Facade class that acts as a bridge between the Text-to-Speech application, which issues the TTS requests, and the underlying
TextToSpeechService
, which renders the response. See Android documentation for TextToSpeech here. - TextToSpeechService is an abstract base class for TTS engine implementations. For more information about this class, see Android documentation for TextToSpeechService.
- AmazonTextToSpeechService is the actual implementation of
TextToSpeechService
that communicates with AACS via the IPC library to issue Text-to-Speech requests. - Core Service is responsible for accepting TTS requests from
AmazonTextToSpeechService
and for routing those requests to the Auto SDK'sTextToSpeech
platform interface, which issues a request to the appropriate TTS provider. For information about TTS providers, see Auto SDKText-To-Speech-Provider
module documentation.
Configuration¶
To specify the intent targets for AASB message intents from AACS, follow the instructions in AACS documentation. The TTS Service defines a list of intent filters in its Android manifest to subscribe to specific AASB message intents. If your application uses static configuration to specify the target for AASB topics, provide the following information as part of the AACS configuration to enable the communication between TTS Service and AACS.
{ ...
"aacs.general": {
"intentTargets": {
"AASB": {
"type": [
"RECEIVER"
],
"package": [
"com.amazon.alexaautoclientservice"
],
"class": [
"com.amazon.aacstts.TTSIntentReceiver"
]
},
"AlexaClient": {
"type": [
"RECEIVER"
],
"package": [
"com.amazon.alexaautoclientservice"
],
"class": [
"com.amazon.aacstts.TTSIntentReceiver"
]
},
"TextToSpeech": {
"type": [
"RECEIVER"
],
"package": [
"com.amazon.alexaautoclientservice"
],
"class": [
"com.amazon.aacstts.TTSIntentReceiver"
]
},
...
}
AASB
, AlexaClient
, and TextToSpeech
topics by using a broadcast receiver. Make sure that you properly populate the type, the package name, and the class name into the intentTargets
JSON node for these topics, as shown in the JSON example above.
Sample Usage¶
Your application uses the Text-to-Speech Service in the same way as it would use any TTS engine. To synthesize speech, the application must create a TextToSpeech
object and then set the input text. The application can also specify the speech pitch and rate.
Initialization¶
Initialize the Text-to-Speech Service in one of two ways:
-
If the Android application creates a
TextToSpeech
object without specifying the package name of the TTS engine to use, the default TTS engine is used.You can change the default engine to Text-to-Speech Service by following these steps:// Create the TextToSpeech using the default engine TextToSpeech textToSpeech = new TextToSpeech(getApplicationContext(), new TextToSpeech.OnInitListener() { @Override public void onInit(int status) { if (status != TextToSpeech.ERROR) { // do things upon TextToSpeechService is initialized } } });
- Open the Android Settings menu and navigate "Preferred Engine" as follows:
app → “General Management” → “Language and Input” → "Text-to-Speech" → "Preferred Engine"
- Choose "Amazon Text-to-Speech Engine" in the list.
-
The Android application can also directly create Text-to-Speech Service by specifying the package name "com.amazon.alexaautoclientservice" when creating the
TextToSpeech
object.You must initiate TTS Service at least once to warm up the language cache before making any synthesis requests.// Create the TextToSpeech by specifying the package name of the TextToSpeechService TextToSpeech textToSpeech = new TextToSpeech(getApplicationContext(), new TextToSpeech.OnInitListener() { @Override public void onInit(int status) { if (status != TextToSpeech.ERROR) { // do things upon TextToSpeechService is initialized } } }, "com.amazon.alexaautoclientservice");
Get Capabilities¶
The Android applications may query the Text-to-Speech Service to get the available voices. For information about voices, see Android Voice class. For information about languages, see Locale class. The following examples show how to get the available voices and languages from the Engine:
// Query the engine about the set of available voices
Set<Voice> voices = textToSpeech.getVoices();
// Query the engine about the set of available languages.
Set<Locale> locales = textToSpeech.getAvailableLanguages();
// Check if the specified language as represented by the Locale is available and supported.
int supported = textToSpeech.isLanguageAvailable(Locale.US)
Synthesize Audio¶
The Android TTS framework provides two ways of synthesizing TTS audio:
- Play out the audio immediately: The Android applications may call the
speak
API.public int speak (CharSequence text, int queueMode, Bundle params, String utteranceId)
- Save to a WAV file: The Android applications may call the
synthesizeToFile
API.public int synthesizeToFile (CharSequence text, Bundle params, File file, String utteranceId)
Known Issues¶
Conversion of MP3 to RAW Audio for TTS on the X86 platform is not yet supported.