Text-To-Speech (TTS) Module¶
Overview¶
The Text-To-Speech
module enables your Alexa Auto SDK client application to synthesize Alexa speech on demand from a text or Speech Synthesis Markup Language (SSML) string. To synthesize speech, this module uses the Text-To-Speech-Provider
module. The Auto SDK does not provide any speech-playing APIs. Your application's TTS module integration is responsible for playing the synthesized speech to deliver a unified Alexa experience to the user.
Note: This feature may only be used with voice-guided turn-by-turn navigation.
Important! The
Text-To-Speech
module requires the Local Voice Control extension.
Configuring the Text-To Speech-Module¶
The Text-To-Speech
module does not require Engine configuration.
Using the Text-To-Speech AASB Messages¶
Prepare Speech¶
To request speech synthesis from a text or SSML input, your application must publish the PrepareSpeech
message. The Engine publishes either the PrepareSpeechCompleted
message or PrepareSpeechFailed
message to indicate success or failure, respectively.
Click to expand or collapse sequence diagram: Prepare Speech
Note: The
prepareSpeechFailed
API contains thereason
parameter that specifies the error string for failure. Refer to the TTS provider errors for more information on errors defined by the TTS provider.TThe TTS module defines the
REQUEST_TIMED_OUT
error that occurs when the TTS provider sends no response, causing the speech request to time out. The timeout value is 1000 milliseconds.
Get Capabilities¶
To request the capabilities of the TTS provider being used, your application must publish the GetCapabilities
message. The Engine publishes the GetCapabilitiesReply
message reply with the capabilities of the TTS provider.
Click to expand or collapse sequence diagram: Get Capabilities
Integrating the Text-To-Speech Module Into Your Application¶
C++ MessageBroker Integration¶
Use the MessageBroker
to subscribe to and publish TextToSpeech
AASB messages.
Click to expand or collapse C++ sample code
#include <AACE/Core/MessageBroker.h>
#include <AASB/Message/TextToSpeech/TextToSpeech/GetCapabilitiesMessage.h>
#include <AASB/Message/TextToSpeech/TextToSpeech/PrepareSpeechCompletedMessage.h>
#include <AASB/Message/TextToSpeech/TextToSpeech/PrepareSpeechFailedMessage.h>
#include <AASB/Message/TextToSpeech/TextToSpeech/PrepareSpeechMessage.h>
#include <nlohmann/json.hpp>
using json = nlohmann::json;
class MyTextToSpeechHandler {
// Subscribe to messages from the Engine
void MyTextToSpeechHandler::subscribeToAASBMessages() {
m_messageBroker->subscribe(
[=](const std::string& message) { handlePrepareSpeechCompletedMessage(message); },
PrepareSpeechCompletedMessage::topic(),
PrepareSpeechCompletedMessage::action());
m_messageBroker->subscribe(
[=](const std::string& message) { handlePrepareSpeechFailedMessage(message); },
PrepareSpeechFailedMessage::topic(),
PrepareSpeechFailedMessage::action());
m_messageBroker->subscribe(
[=](const std::string& message) { handleGetCapabilitiesReplyMessage(message); },
GetCapabilitiesMessageReply::topic(),
GetCapabilitiesMessageReply::action());
}
// Handle the PrepareSpeechCompleted message from the Engine
void MyTextToSpeechHandler::handlePrepareSpeechCompletedMessage(const std::string& message) {
PrepareSpeechCompletedMessage msg = json::parse(message);
std::string speechId = msg.payload.speechId;
std::string streamId = msg.payload.streamId;
std::string metadata = msg.payload.metadata;
prepareSpeechCompleted(speechId, streamId, metadata);
}
// Handle the PrepareSpeechFailed message from the Engine
void MyTextToSpeechHandler::handlePrepareSpeechFailedMessage(const std::string& message) {
PrepareSpeechFailedMessage msg = json::parse(message);
std::string speechId = msg.payload.speechId;
std::string reason = msg.payload.reason;
prepareSpeechFailed(speechId, reason);
}
// Handle the GetCapabilities reply message from the Engine
void MyTextToSpeechHandler::handleGetCapabilitiesReplyMessage(const std::string& message) {
GetCapabilitiesMessageReply msg = json::parse(message);
std::string messageId = msg.header.messageDescription.replyToId;
std::string capabilities = msg.payload.capabilities;
// ...Handle capabilities of the TTS provider...
}
// To prepare speech, publish the PrepareSpeech message to the Engine
void MyTextToSpeechHandler::prepareSpeech(
const std::string& speechId,
const std::string& text,
const std::string& provider,
const std::string& options) {
PrepareSpeechMessage msg;
msg.payload.speechId = speechId;
msg.payload.text = text;
msg.payload.provider = provider;
msg.payload.options = options;
m_messageBroker->publish(msg.toString());
}
// To get capabilities, publish the GetCapabilities message to the Engine
std::string MyTextToSpeechHandler::getCapabilities(
const std::string& requestId,
const std::string& provider) {
GetCapabilitiesMessage msg;
msg.header.id = requestId;
msg.payload.provider = provider;
m_messageBroker->publish(msg.toString());
// The Engine will send the GetCapabilitiesReply message
// Return the capabilities from reply message payload
}
void MyTextToSpeechHandler::prepareSpeechCompleted(
const std::string& speechId,
const std::string& streamId,
const std::string& metadata) {
// Use MessageBroker openStream API to get the MessageStream
std::shared_ptr<MessageStream> preparedAudio =
m_messageBroker->openStream(msg.payload.streamId, MessageStream::Mode::READ);
// Follow the UX guidelines in order to play the audio stream
}
// Notification of a failed speech synthesis
void TextToSpeechHandler::prepareSpeechFailed(
const std::string& speechId,
const std::string& reason) {
// Use the speechId to correlate the synthesis request to the result
// Access the reason for failure
}
};